DOCUMENT RESUME 



ED 266 156 



TM 860 099 



AUTHOR 
TITLE 



INSTITUTION 

SPONS AGENCY 
PUB DATE 
GRANT 
NOTE 

PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



ABSTRACT 



Burstein, Leigh; And Others 

Using State Test Data for National Indicators of 
Education Quality: A Feasibility Study. Final 
Report. 

California Univ., Los Angeles. Center for the Study 
of Evaluation. 

National Inst, of Education (ED), Washington, DC. 
Nov 85 

NIE-G-83-001 
275p. 

Reports - Evaluative/Feasibility (142) 
MF01/PC11 Plus Postage. 

Data Collection; ^Educational Assessment; 
♦Educational Quality; Elementary Secondary Education; 
Feasibility Studies; Longitudinal Studies; Mastery 
Tests; Minimum Competency Testing; National Norms; 
Pilot Projects; Public Policy; Standardized Tests; 
♦State Programs; *State Surveys; *Testing Programs 
♦Educational Indicators; National Assessment of 
Educational Progress 



The desire for a national picture of educational 
quality remains a continuing but unresolved goal. A question has been 
raised among high level policymakers regarding the feasibility of 
using existing data collected by the states to construct education 
indicators for state-by-state comparisons of student performance at 
the national level. A feasibility study was contracted to the UCLA 
Center for the Study of Evaluation (CSE) to explore the 
methodological and implementation issues of this approach. The 
results of the feasibility study are described and discussed in this 
report. Included in the study are analyses of: (1) the general 
characteristics of current state testing programs and of the content 
of currently used state tests; (2) alternative approaches to linking 
test results across states to create a common scale for purposes of 
comparison; and (3) the availability of auxiliary information about 
students and schools and its potential use in creating more valid 
indicators of achievement. These analyses culminated in a number of 
recommendations about ways to facilitate the use of state data for 
national comparisons. These recommendations focus on basic 
preconditions, proposed approaches, pile, study needs, auxiliary 
information collection and documentation, and strategies for 
optimizing political, institutional, and economic support. 
(Author/LMO) 



*********************************************************** 

* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. * 
********************************************************************* ** 

ERLC 



Center for the Study of Evaluation 



UCLA Graduate School of Education 

Los Angeles. California 90024 



USING STATE TEST DATA FOR NATIONAL 
INDICATORS OF EDUCATION QUALITY: 
A FEASIBILITY STUDY 



Leigh Burstein, Eva L. Baker, 
and Pamela Aschbacher 
x£/ Center for the Study of Evaluation 

University of California, Los Angeles 

J. Ward Keesling 
^5 Advanced Technology, inc. 



LU 



f^Th 



cr 



U.8. DEPARTMENT OF EDUCATION 

NATIONAL INSTITUTE OF EDUCATION 

EDUCATIONAL RESOURCES INFOhMATlON 
CENTER (ERICI 
This document has been reproduced as 
received from the person or organization 
originating it 

Minor changes have been made to improve 
reproduction duality 

• Points ot view or opinions statec in this docu 
ment do not necessarily represent oHic«l NIE 
position or policy 

"I 

M 



OT* 2 ! HH 5 BH 5 ! HB S| "PERMISSION TO REPRODUCE THIS 

<f) ■ ■ H| MATERIAL HAS BEEN GRANTED BY 

■ ■ ■ ■ ■ ■ Jl.G-r.ffi+U 

o mm mm mm mm mm i 

\Q 

^° IB HI mm TO THE EDUCATIONAL RESOURCES 

■JJ B M INFORMATION CENTER (ERIC)" 



ERIC 



FINAL REPORT 



USING STATE TEST DATA FOR NATIONAL 
INDICATORS OF ELUCATION QUALITY: 
A FEASIBILITY STUDY 



Leigh Burster, Eva L. Baker, 
and Pamela Aschbacher 
Center for the Study of Evaluation 
University of California, Los Angeles 

J. Ward Keesling 
Advanced Technology, Inc. 



November, 1985 
Grant Number NIE-G-83-001 



Center for the Study of Evaluation, 
Graduate School of Education, 
University of California at Los Angeles 



eric 



9„ J 



ACKNOWLEDGEMENT 



This study was funded by Grant NIE-G-83-001 from the National 
Institute of Education and jointly monitored by the National 
Center for Education Statistics. However, the opinions expressed 
herein do not necessarily reflect the position or policy of 
the National Institute of Education or the National Center for 
Education Statistics, and no official endorsement by either 
agency should be inferred. 

We wish to thank Emerson J. Elliott for his initiation and 
support for the project. Conrad Katzenmeyer served as NIE project 
monitor with Jean Brandes of the NCES assistance in facilitating 
interagency communication. The Policy and Technical Panel 
members (Darrell Bock, Dale Carlson, Ward Keesling, Tom Kerins, 
Robert Linn, E. Roeber, Richard Shavelson, Lorrie Shepard and 
Marshall Smith) contributed substantially to this report through 
their advice on study options, written comments, and support for a 
thorough and open exploration of complex technic*! and political 
issues. Bill Doherty supervised the collection of the 
Telephone Interview Survey data from State Testing Programs. 
State Testing Directors were very cooperative in providing 
information and copies of the reports they distributed. 
Victoria Gouveia, Kathy Fliller, and Melinda Baccas provided 
secretarial, administrative, and research assistance, and Ms. 
Gouveia was responsible for graphical layout and report assembly. 
Fiscal management and administration was ably provided by Dr. 
* Joan Herman of CSE. 



4 



TABLE OF CONTENTS 



Executive Summary 

Chapter 1. Project Overview 

Purpose of Study 

Project Activities !- 2 

Recommendations from First Policy and 

Technical Panel Meeting I- 3 

Recommendations of Second Policy and Techaxcal 

Panel Meeting • 1 ■ 4 

Overview of the Report !- 6 

Chapter 2. Description of Existing State Testing Programs 

Procedures 2.1 

State Participation 2.1 

Focus of Interview * 2.2 

Summary of State Testing Activities 2.4 

Chapter 3. Consideration of Common Test Linking Strategies 

Statement of the Problem 3.1 

Procedures for Examining Alternative Approaches * * 3.2 

Basic Psychometric Alternatives 3.3 

Matched Test Data 3.4 

Common Anchor Items 3.5 

Preferred Option 3.7 

Source of Common Anchor Items 3.7 

NAEP 3.8 

Commercially Available Standardized Tests 3.9 

State Developed Items 3.10 

Other Sources 

Preferred Option 3.12 

Implementation Issues 3.13 

Summary and Recommendations 3.15 

Chapter 4. Content Analysis of Existing State Tests 

Statement of Problem 4.1 

Procedures * "1 7 

Basic Results 4.7 

Reading J. 8 

Mathematics 4. 13 

Writing J. 18 

Exemplary Practices 4.18 

Summary and Recommendations 4.23 



ERIC 



« a B gr ' ■ "^■S^ . ^M .-' ! ' ~Ji«g g g .ffg F l| | V l - ■■■ ■ - °™ ^ ■ ■ i.i,.«^u,u; V ^^u« W a 



5 



Chapter 5. Examination of Reporting Practices and Auxiliary 
information 



Statement of tha Problem 5.1 

Longitudinal Contrasts , 5.1 

Subgroup Contrasts 5.2 

Current Collection and Reporting Practices 5.2 

Summary and Recommendations 5.7 

Chapter 6. Overall Summary and Recommendation 

Preconditions and Guiding Principles 6.2 

Pilot Study 6 » 4 

Auxiliary Information and Documentation 6.5 

Political Institutional , and Economic Environment ....6.6 
Cost Implications : An . Addendum c .... 6 . 7 

Appendices 

1. List of Panelists for Feasibility "Study 

2. Telephone Interview Guide 

3. Partial Summary of First Policy and Technical Panel 

Meeting 

4. Decision Memorandum on the Feasibility of Using State- 

level Data for National Educational Quality Indicators 

5. Sources of Information about State Testing Programs 

6. Survey Summary of General Characteristics of State 

Testing Programs 

7. Bock Memorandum and Panel Responses on Common T&st 

Linking Issues 

8. Summary of Documents Provided by State Testing Programs 

9. Master Matrices for Math, ReMing, and Writing 

10. Decision Rules for Content Analysis of State Tests 

11. Rating Categories of "Source Quality" 

12. Comments on Sources of Information and Quality of 

Information 

13 Definition and Identification of Skills 

14. Key to Summary Sheets 

15. Detailed Reading Summary from Content Analysis of State 

Tests 

16. Detailed Math Summary from Content Analysis of State Tests 

17. Detailed Writing Summary from Content Analysis of State 

Tests 

18. Summary of Number of Items and SubsJcills in Each Cell 

of Math Matrix for Grades 4-6 and 4-9 in AL, CA, FL, 
LA, and PA 

19. Coding of Reporting Practices and Auxiliary Information 

20. Linking State Educational Assessment Results: A 

Feasibility Trial 



G 



ERIC 



STATE TESTS AS QUALITY INDICATORS PROJECT 



EXECUTIVE SUMMARY 

The desire for a national picture of educatiorial 
quality remains a continuing but unresolved goal. Last fall, a 
question was raised among high level policymakers regarding the 
feasibility of using existing data collected by the States to 
construct education indicators for state-by-state comparisons of 
student performance at the national level. A feasibility study 
was contracted to the UCLA Center for the Study of Evaluation 
(CSE) to explore the methodological and implementation issues of 
such an approach. 

The results of the feasibility study are described and 
discussed in this report. Included *.n the study were analyses of 
the general characteristics of current state testing programs and 
of the content of currently used state tests; of alternative 
approaches to linking test results across states to create a 
common scale for purposes of comparison; and of the availability 
of auxiliary information about students and schools and its 
potential use in creating more valid indicators of achievement. 

These analyses culminated in a number of recommendations 
about ways to facilitate the use of state data for national 
comparisons. These reconinendations focus on basic preconditions, 
proposed approaches, pilot study needs, auxiliary information 
collection and documentation, and strategies for optimizing 
political, institutional, and economic support. 

The following recommendations are made regarding basic 
preconditions and guiding principles for the use of state '-est 
data: 

1. The comparison of the performance of states should 
include only those states where there is sufficient empirical 
evidence to all™ analytical adjustments for the effects of 
differences in testing conditions. All states that collect test 
data on the pertinent content areas at the designated grade 
levels or whose test results can be statistically adjusted to the 
targeted testing conditions should be considered for inclusion in 
cross-state comparisons. 

2. Existing state testing procedures should be disrupted as 
minimally as possible. Only those data collection activities 

co sidered essential for obtaining evidence of comparability 
should be introduced over and above the states* own planned 
expansions and extensions of their testing activities. 

3. Existing state tests and testing data should be used as 
much as possible. 



i 



ERIC 



i 



4. Regardless of the optimal specificity desired In the 
reporting of cross-state performance, the content of the tests to 
be used for comparison purposes should be specified at as low a 
level (subskill or subdomain) as possible to enhance the quality 
of the match to existing tests and to encourage attention to the 
content and detail of what is being tested. 

5. If cross-state comparisons are to be achieved through 
linking of a state's test to a common linking test, the content 
covered by the linking test should be as broad as possible both 
to ensure overlap with each state's tests and to encourage 
broadening rather than narrowing of the curriculum across the 
states . 

6. The proposed approaches for developing state-by-state 
achievement indicators should be compatible with the wider issue 
of the development of systems for monitoring instruction 
practices as well as educational progress both within and across 
the states. Desireable augmentations of current state practices 
should increase documentation of student and school 
characteristics within the framework of planned changes in state 
educational activities. 

The following recommendations are made with regard to 
optimal approaches to the problem of linking test data across 
states and the implementation of the desired approaches. 

7. A common anchor item strategy, wherein a common set of 
linking test items is administered concurrently with the existing 
state test to an "equating-size" sample of schools and students, 
should be used as the basis for expressing test scores from 
different states on a common scale. 

8. The items contributing to the common anchor set should be 
selected from multiple sources including existing state-developed 
tests, NAEP, commercially available tests, and other rolicy 
relevant and technically adequate sources, such as the IEA tests. 



9. The mechanisms for establishing the skills to be included 
in the common anchor set, for selecting items to represent the 
skills, and for specifying the rules for participation by 
individual states should be developed and administered primaril} 
by collective representation of the states. 

10. The organization responsible for developing and 
administering the linking effort should consider the following 
points relevant to implementation: 

a. Procedures for documenting contents of existing state 
tests should be specified so that questions of what is being 
equated to what can be addressed. 



11 



ERIC 



S 



b. Specification of content represented in common anchor 
set should be at the lowest level possible (subskill level) 
even if achievement indicators, at least initially, are to 
be reported at higher levels (skill or content area). 

c. The minimum criteria for considering an item for 
inclusion in the common anchor item set should be that 

o The item measures a skill selected for inclusion in the 
common anchor item set, and 

o Sufficient empirical evidence is available about the item 
to ascertain its behavior for the major segments of the 
student population with which it will be used. 

d. The selection of items should be made by teams of 
curriculum and testing specialists from a broad-based pool 
of items without identification of their source as is 
technically feasible. 

e. The following set of testing conditions should be 
specified: 

o Target grades and range of testing dates along with 
requirements for special studies in those states who 
normally test outside the chosen range or do not test at 
present but elect to participate. 

o Procedures for concurrent administration of the common 
anchor item set with existing state tests for the various 
alternative types of state tests (matrix sampled, state- 
developed single form, commercially developed standardized 
test) . 

o Auxiliary information for checking subgroup bias and 
determining sample representativeness (for equating and 
scaling purposes). 

o Minimum sample sizes (for both schools and students). 

The following recommendation is made with regard to the need 
for pilot studies of the proposed approach: 

11. A pilot study of the proposed common test linking 
strategy should be conducted in a limited set of skill areas for 
a specific grade range in order to determine both the quality of 
the equating under preferred conditions and the effects of 
various deviations from these conditions. The content areas and 
grade levels to be used in the proposed pilot study are literal 
comprehension for reading and either numbers and numeration or 
measurement for mathematics at grades 7-9. 



iii 



The following recommendations are made with regard to the 
need for auxiliary information and documentation about student and 
school characteristics: 

12. The organization responsible for coordinating the test 
linking activities described earlier should also develop plans 
for obtaining routinely a select set of common auxiliary 
information from states about their students and schools. 

13. Cooperating states should be encouraged to provide on an 
annual basis uniform documentation describing their data 
collection activities. 

14. Cooperating states should work toward the collection 

of a common set of auxiliary information about student and school 
characteristics along with their testing data. A standard set of 
definitions for measuring the chosen characteristics should be 
determined. 

15. The organization responsible for coordinating test 
linking efforts should consider ways of contextualizing state 
test comparison data to mitigate against the possibility of 
unwarranted interpretations. The auxiliary information gathered 
as part of the previous recommendation should contribute to this 
activity. 

The following recommendations are made with regard to 
establishing an effective political, institutional, and economic 
environment for the indicator effort: 

16. To develop the necessary levels of political support for 
this activity, broad-based support for the idea should be 
developed. Key participants include Chief State School Officers, 
their staffs, and other state education officials; other prominent 
state officials, including the Governor, Members of Congress, and 
state legislators; and representation of members of large city 
school districts, the education associations and from the private 
sector . 

17. An institutional structure for the conduct of this 
activity that relies heavily on the collective efforts of the 
states should be adopted. The Council of Chief State School 
Officers' new Assessment and Evaluation Coordinating Center 
proposal deserves consideration for this purpose. 

18. Technical assistance and oversight should be established 
to assure the technical and methodological quality of the linking 
and equating, of the content of measures, and of validity of 
interpretations. This oversight should be provided by independent 
or semi-independent panels, perhaps modeled on the panels 
advising the MAEP activity. 



iv 



ERIC 



10 



19. A long-term, secure basis of financial support for 
coordinating and updating the test linking activity and the 
collection and reporting of common auxiliary information should 
be developed. This support is necessary to ensure that 
modifications in the basis of comparison and in the participating 
states can be accommodated over time while maintaining the 
integrity of the linking effort. 



ERIC 



H 



Chapter 1 
Project Overview 



Purpose of Study 

Various efforts to improve the capacity for collecting and 
reporting achievement indicators of educational quality and to 
improve methods for obtaining comparable state-level performance 
data serve as both a back^op and an impetus for this study. One 
natural consequence of both the recent concern for the quality of 
existing educational offerings and the desire to monitor the 
consequences of proposed reforms has been an expanded search for 
high quality data to inform educators and policy makers. Various 
groups have begun to search for education in, — cators to serve as 
benchmarks for judging educational progress and status, tormer 
Secretary of Education Bell's release of his State Education 
Statistics charts with state data and state rankings on the SAT 
and ACT plus other variables is the most visible example of this 
effort. The attention it received from the r-ess, the public, and 
various education organizations established the current climate 
in which other education indicator efforts are viewed. 

Of particular concern in the realm of indicators of 
educational performance has been the appropriate selection and 
proper use of measures of educational achievement to compare the 
accomplishments of individual states. A basic dilemma is thcit 
although students undergo a substantial amount tr-sting during 
the course of their educational careers, virtually all of this 
testing is determined by local and state policies (annual 
district standardized achievement testing, state assessments, 
minimum competency and proficiency testing) or by individual need 
and initiative (special education testing, college admissions 
examinations). While these testing activities may be *_^able for 
the purposes for which they were designed, none can be readily 
translated into a uniformly acceptable achievement standard for 
comparing the quality of educational programs across states. In 
essence there exists no nationally common test that is currently 
administered in a manner that will serve such a purpose. The 
self-selection in taking the SAT and ACT ma,.,* their results a 
flawed oasis for state-level comparisons. The current design for 
sample selection and arlministration schedule of the National 
Assessment of Educational Progress (NAEP) does not provide 
sufficiently representative or current data in most states to 
make it a suitable source for such comparisons. 

The desire for a national picture of educational quality 
remains a continuing but unresolved goal. In the past, there has 
been some resistance from States about comparative information of 
any sort. The arguments have centered on the need for good 
contextualization of information so that differences in 
performance can be properly attributable to quality of 
educational services and not to social and economic conditions in 
the regions themselves. 



Page i.l 



9 

ERLC 



1 



A national test has been proposed periodically as a 
solution, but has been rejected because of the constitutional 
delegation of educational responsibilities to the States and the 
attendant notion that such a test woul 1 exert untoward Federal 
pressures toward uniformity in educational practices. The cost 
of such a new test (or radical expansion of the NAEF sampling and 
scheduling) would also be high. 

Last fall, a question was raised among high level 
policymakers regarding the feasibility of using existing 
mechanisms within the States to contribute to the picture of 
American educational quality. Specifically under consideration 
was the extent to which existing measures of stuaent performance 
collected by the States could be combined to 1) provide a 
national profile of performance in achievement domains; 2) 
provide a basis for state-by-state comparisons of student 
performance. A feasibility study (hereafter referred to as the 
State Tests as Quality Indicators (STQI) Project) was contracted 
to the UCLA Center for the Study of Evaluation (CSE) co explore 
the methodological and implementation issues of such an approach. 
This report describes the activities of the STQI Project, 
summarizes project analyses, and presents recommendations 
regarding the feasibility of using existing state tests for the 
desired purposes. 

Project Activities 

The basic charge to CSE in conducting the STQI Project was 
to document existing stite testing program activities with 
specific emphasis on the possibilitity of using data already 
routinely collected to form "comparable 19 state-level achievement 
indicators and to determine the analytical and psychometric 
methods necessary or potentially appropriate to generate the 
desired indicators. With respect to the latter, the original 
proposal identified four goneral approaches that might be 
applicable: direct equating of test content; econometric 
adjustments for selection and/or economic and socioeconomic 
conditions; equating by the use of a common test or linking 
measure; and methods that depend only on within-state information 
such as trend data and subgroup comparisons. 

To implement its charge, CSE carried out the following 
activities: 

1. Conducted a telephone interview survey of State testing 
directors to obtain information about their program 
characteristics ; 

2. Examined copies of reports routinely generated by the 
State testing programs to ascertain additional details about the 
content being assessed and the procedures used for analyzing and 
reporting results; 

3. Convened two panel meetings of scholars and practitioners 
in Washington (November 29-30, 1984; April 15-16, 198 5^ to engage 



Page 1.2 



13 



in a discussion of issues and options along with interested 
observers from government and professional organizations. 



4. In response to a modified charge coming out of the first 
Panel meeting, carried out a detailed content analysis of 
existing state tests (both state -developed and commercially 
developed) and 

5. Identified the nature and range of auxiliary information 
about student and school characteristics either collected or 
reported with state testing data that might serve as additional 
factors to consider with respect to the quality of a state's 
educational performance. 

The details of activities 1,2,4, and 5 are reported in 
subsequent chapters. To pro Ide perspective on the reasons for 
these activities, it is necessary to recount the recommendations 
coming out of the two panel meetings and CSE actions in response 
to the recommendations. 

Recommendations frcm the First Policy and Technical Meeting 

The Policy and Technical Panel *or the STQI Project (A 
complete list of panelists is provided in Appendix 1) included 
university scholars with both policy and technical expertise 
relevant to the project's focus and practitioner representatives 
from several major long-term state testing programs. The 
meetings of the Panel were scheduled in Washington so that 
representatives of the governmental agencies with interest in 
education indicators (National Center for Education Statistics, 
National Institute of Education, Office of Planning, Budget, and 
Evaluation, Office of Technology Assessment) and various 
professional organizations could participate in the discussions. 

The purpose of the first Panel meeting was to consider which 
of the available approaches for deriving indicators from state 
data were potentially useful given current testing practices, and 
thus which approaches CSE should explore in greater depth using 
reports provided by the ates. As preparation for the meeting, 
CSF conducted in-dept* ^hone interviews (Appendix 2) with 
representatives from testing programs and requested copies 

of existing reports a*, wontent specifications generated by state 
testin programs. The results of these phone interviews were 
then combined with information from o*-her recent surveys of state 
testing activities and distributed to meeting participants. This 
information was m; nded to place the proposed approaches within 
a context cf existing practices and aid in the effort to refine 
and focus the remaining tasks of the feasibility study. 

A partial summary of the deliberations at the first Panel 
meeting is provided in Apper iix 3. While there was interest in 
all the approaches considered for combining state-level data for 
national comparative purposes, opinions of the meeting 
participants converged on using a common test linking and 
equating approach based on the administration of relevant common 



Page 1.3 



ERLC 



14 



measures along with each state's own test to a sample of 
students. There was a consensus that the STQI Project should 
devote further effort to identifying and describing the 
conditions states would have to meet to develop a common scale by 
using a common test linking approach. This examination was to 
focus on technical considerations (timing, dimensionality 
characteristics of the test, sample size needed) and resource and 
time considerations. 

In addition to the recommendation on further study of the 
common linking approach, the participants recommended that CSE 
proceed with the following tasks: 

1. Complete the interviewing about state testing activities 
and develop a chart that characterizes these activities. 

2. Continue to obtain representative reports generated by 
state testing programs and conduct an analysis of their content 
with respect to the methodology used to develop, analyze, and 
report data at the state level. 

3. Conduct an examination of the content of state tests 
including analysis of both content specifications and actual 
items where feasible. 

4. Explore further the feasibility of developing summary 
Consumer Report-type indicators of trends with respect to 
diversity of content measures, complexity of skills measured, 
longitudinal changes, and subgroup differences. 

5. Att ot to provide resource and time estimates necessary 
to both piL. and fully implement the approaches judged to be 
fruitful to arrive at state-level education indicators. 

Recommendations from the Second Policy and Technical Meeting 

To implement the recommendations from first Panel meeting, 
several activities were carried out by CSE staff and members of 
the Panel. First, to obtain a clearer statement of the technical 
options for employing the equating and linking strategies, 
R. Darrell Bock, a member of the Panel, was asked to provide a 
memorandum describing the psychometric alternatives and the 
conditions necessary to implement them. This memorandum was then 
circulated to other Panel nuimbers for their reaction prior to the 
scheduled April Panel meeting. Written feedback from other 
Panelists was distributed along wich other materials prepared for 
the meeting. 

Second, CSE staff conducted a detailed examination of 
existing tests used by states. This content analysis was intended 
to provide a basis for judging whether there was sufficient 
overlap in content coverage and grade levels assessed among the 
states to actually implement a linking effort. It was also hoped 
that this activity would suggest ways to develop indicators that 
portray the diversity of content covered in existing state tests. 



Page 1.4 



15 



The third major CSE activity was an examination of the state 
reports to determine whether there was sufficient information to 
develop within-state trend, and subgroup comparisons to serve as 
indicators across states. This investigation also sought to 
establish the degree of overlap in the scales states used to 
report performance and whether states collected and/or reported 
auxiliary information about the characteristics of their students 
and schools that could be usea to concextualize student 
performance. 

At the beginning of the second Panel meeting, participants 
received the available correspondence with respect to the BocJc 
memorandum on technical alternatives, the draft materials from 
the detailed content analysis, a draft of the survey of auxiliary 
information collected and/ or reported by states, and a draft 
outline for the final repeat. Using these materials, participants 
discussed the advantages and disadvantages of two alternative 
strategies for applying the common linking approach, namely: 

1. Matched test data strategy where scores from separate 
administrations of the linking test (presumably NAEP) and 
existing state tests would be matched at the pupil level; 

2. Common anchor i tern strategy where the linking test and 
the existing state test would be administered concurrently. 

Two concerns needed to be addressed before a decision could 
be reached about how either linking strategy might be applied. 
First, the question of possible content of the common tests was 
raised. To that end, participants examined the content analysis 
of tests or specifications of tests from 38 responding states who 
were conducting testing programs as of Spring 1984. Based on 
these data, the panelists recommended that two or three skill 
areas at a single grade level be chosen for initial examinations 
of equating options based upon the frequency of the skill areas 1 
inclusion in State measures and the frequency at which various 
grade levels were represented in State test administrations. The 
areas of literal comprehension in the reading achievement area 
and either numbers and numeration or measurement in the 
mathematics achievement area at grades 7 through 9 were 
considered most suitable for initial equating efforts. 

The second conct rn was the nature of the common measure 
proposed to serve as the basis for equating the disparate state 
measures. It was determined that technical procedures now exist 
that make it possible to equate tests without requiring that all 
sampled students respond to the same set of common items. 
However, the measures needed to share certain technical 
characteristics with the target measures in reading and math. 
Principal among tnese characteristics was unidimensionality of 
the scale. 

The remainder of the discussion focussed on the source of 
items for the common linking measure. Three alternative sources 



Page 1.5 



ERLC 



of test items received the greatest attention: NAEP, commercially 
available standardized achievement tests, and items from state- 
developed tests. The strengths and weaknesses of each of these 
options were explored. Among those present at the end of the 
meeting, a preference was expressed for drawing primarily from a 
pool of items developed by the states as this option best limits 
the federal presence and retains states' control of the linking 
effor*. However, it was recognized that all sources could 
provide items that could contribute to a broad-based linking 
effort. It was also understood that this preferred option 
required substantial cooperation among states, additional burdens 
on state testing programs, and increased testing costs that would 
have to be borne by some level of government. These factors 
might lead the affected Federal and State agencies to prefer 
expanded NAEP testing despite its drawbacks if the latter could 
be done more cost effectively. 

The Panelists felt that it would not be possible to decide 
whether the common linking strategy was feasible without 
conducting an exploratory study of the conditions that could 
affect the equating effort. Specifically, they recommended that 
the common anchor item strategy be tried on an exploratory basis 
for a two-year period, after which judgments about continuation, 
modification, or expansion could be made. 

Following the Panel meeting, CSE was expected to complete 
their examinations of tests and reports to provide as complete a 
tocumentation as possible to inform decision-makers and persons 
charged with implementation of the chosen option. It was agreed 
at the April Panel meeting that reporting of project results was 
to be done at two levels. A decision memorandum describing study 
purpose and procedures, options considered and recommendations 
was to be prepared for the Director of NCES.* A larger report 
chat provides details of all project activities was to be 
prepared with a broader target audience of both federal and 
state officials interested in current practices in stat3 testing 
and their potential for contributing to comparative indicators of 
education quality. 

Overview of the Report 

This report is intended to provide the detailed 
documentation of the activities carried out under the auspices of 
the STQI Project. Given the diverse interests and expertise of 



* A copy of the decision memorandum appears in Appendi- 4. 
This memorandum was submitted July 30th. Subsequent to its 
submission, there were slight modifications in certain 
project recommendations in response to additional input from 
project panelists and state and federal officials concerned 
about education indicators. However, the main thrust of the 
final project recommendation remained consistent with the 
earlier memorandum. 



Page 1.6 



17 



its target audiences (primarily policy makers, their staffs, and 
state testing practitioners), we have tried to separate reporting 
of the main themes in the investigation from more fine-grained 
treatment of the details of state testing practices. Much of the 
latter has been relegated to appendices. 

The remainder of this report is divided into four separate 
chapters on specific project activities plus a summary chapter 
and appendices. The description of existing state testing 
programs is provided in Chapter 2. This chapter describes CSE's 
procedures for obtaining the information about programs, other 
sources of information about these programs, and provides cross- 
state summaries of current practices. In Chapter 3, alternative 
approaches for using a common test liiiking strategy for 
expressing state results on a common scale are considered in 
detail. Included in this examination are descriptions and 
evaluations of basic psychometric alternatives, delineation of 
possible sources of test items to contribute to the linking tests 
and implementation issues associated with the preferred options 
for linking. The results of the detailed content analysis of 
existing state tests are reported in Chapter 4. In addition to 
the basic facts regarding present test contents, we attempted to 
highlight exemplary practices and to document the choice of 
content areas and grade levels for the exploratory study 
recommended by the Panelists. The project effort in documenting 
reporting practices and the collection and use of auxiliary 
information about student and school characteristics is provided 
m Chapter 5. Current practices and possibilities for reporting 
between-state comparisons of within-state longitudinal and 
subgroup performance contrasts are emphasized. In addition, 
recommendations are made for improving state practices in the 
collection and reporting of auxiliary information. 

while the above overview accurately characterizes the 
substance of our report on prevailing practices, it does little 
to place its contents in perspective with respect to either the 
forces that led to its initiation or the multitude of in-progress 
changes in state testing practices. As we see it, this project 
was initiated to inform a policy formation process wherein 
historically federal and state agencies have contended over the 
prerogatives in documenting national educational progress. At 
present, however, both levels of government (the federal through 
its annual reporting of State Education Statistics and education 
indicator efforts, the States through the actions of the Council 
of Chief State School Officers (CCSSO) endorsing cross-state 
corr.^ risons and establishing a Center on Assessment and 
Evaluation to cooidinate information on state practices and to 
support efforts to align state programs more closely) have 
initiated actions that could lead to the gathering and reporting 
of comparative state-level data on educational achievement. 

But the basis for these comparisons, the organizational and 
administrative mechanisms for compiling them, and the sources of 
support for the necessary expansions in data collection and 
reporting remain to be determined. It may well be that 



Page 1.7 



alternatives preferred on purely technical and organizational 
grounds are too costly or too politically onerous for either 
federal or state agencies , or that cost-effective alternatives 
too dramatically change the balance of roles and 
responsibilities. In either of these circumstances, the current 
will to cooperate between the federal government and the States 
m the development of national achievement indicators could well 
dissolve. If this were to come to pess, it is highly unlikely 
that the kinds of alternatives that we were charged to 
investigate could ever be implemented. Whether the country would 
be left then with present practice (i.e., SAT/ ACT comparisons) or 
two competing systems is unclear; neither of these alternatives 
would seem to be desirable. 

The other major caveat that must be considered in reading 
this report is that current state-level reform efforts are 
bringing about significant cbanges in current state testing 
practices. If current plans on various state drawing boards are 
implemented and maintained , more students will be tested at more 
grade levels in a broader array of subject matters for & greater 
number of states. These changes could eventuate in an expanded 
base of commonality of testing practices and thus enhanced 
possibilities of using state testing data for comparative purposes. 

In the short term, however , it means that attempts to 
document existing state practices are inherently imprecise. At 
various points in our investigations , we have been forced to 
choose between describing what existed at the time of our data 
collection , what was currently being implemented , and what a 
state anticipated would happen in the near future. The state of 
Mississippi is illustrative here. According to practices prior to 
1984 (as reported in the Southern Regional Education Board's 
report on test results from the South), Mississippi operated both 
an assessment program which used a commercially available 
standardized achievement test and a minimum competency testing 
program. The Education Commission of the States' December 1984 
report on current state assessment practices cites only the 
former program. Our own sources of information portrayed a mixed 
picture of a system in fcusition where a state-developed test 
was planned for implementation within the next three years. As a 
result, we classified Mississippi differently depending on the 
specific issue we were attempting to address. These kinds of 
apparent inconsistencies appear throughout the chapters of the 
report although as best we can determined, they have no impact on 
either our interpretations of the data or our study 
recommendations • 

What the active change efforts at both federal and state 
levels did mean for our project was that we found it necessary to 
adopt certain basic guiding principles about how intrusive the 
options recommended could be with respect to existing practices 
considering what was likely to occur in the near future. That 
is, since both federal and state agencies are commit :ed to cross- 
state comparisons and state testing programs are changing, we 
thought it reasonable to consider alternatives that would require 



Page 1.8 



greater uniformity in practice than currently exists and that 
depended on multi-state cooperation to develop the desired 
achievement comparisons. At the same time, we took our charge to 
concentrate on state testing data as the basis for comparative 
indicators to mean that the preferred options should leave as 
much discretion as possible to the States collectively. To 
achieve this desired goal while ensuring that the resulting 
comparisons have a firm technical base, we assumed that the 
following basic principles should guide our examinations of 
alternative approaches for deriving comparative achievement data 
based on existing state testing programs and practices: 

1. Existing state testing procedures should be disrupted as 
minimally as possible. Only those data collection activities 
considered essential for obtaining evidence of comparability 
should be introduced, ovor and above the states 1 own planned 
expansions and extensions of their testing activities. 

2. Existing state tests and testing data should be used as 
much as possible. Thus, to the extent that is feasible, state 
test data would serve the multiple purposes dictated by both its 
original intent and the desire for cross-state comparisons. 

3. Regardless of the specificity desired in the reporting of 
cross-state performance, the content of the tests to be used for 
comparison purposes should be specified at as low a level 
(subskill or subdomain if possible) as possible to enhance the 
quality of the match of existing tests to the linking tests and 
to encourage attention to the details of what is being tested. 

4. The content covered by the linking tests should be as 
broad as possible both to ensure some degree of overlap with each 
state* s tests and to encourage broadening rather than narrowing 
or the curriculum across the states. 

5. While the present project charge by necessity focuses 
discussion on state-by-state achievement indicators, the proposed 
approaches should be compatible with the wider issue of the 
development of systems for monitoring practices and progress both 
within and across the states. Augmentations of present state 
practices that encourage improvements in documenting the 
characteristics of its students and schools within the framework 
of planned changes in state educational activities at minimal 
added expense are desirable. To the extent possible, these 
augmentations should be designed to serve the dual purpose of a 
national monitoring system as well. 

Ia essense, we are examining the feasibility of developing a 
set of state-by-state achievement indicators that grows out of 
existing state testing activities. The resulting set of 
indicators should draw heavily from the content specifications 
and item pools collectively administered by States but by 
necessity may include content unevenly distributed among current 
state tests. Ideally, the proposed achievement indicators should 
build upon and extend the capacity of individual States to 
monitor comparatively the progress of their students within a 



Page 1.9 



ERIC ZO 



broad framevork of curricular objectives arrived at through 
collective and collaborative decision-making by representatives 
of the States. The purpose of this project , then, is to ascertain 
the conditions that support or impede progress toward this ideal 
and where possible , to suggest feasible mc 1if ications and 
extensions of current testing activities to better approximate 
the intended goal of a national set of state-by-state achievement 
indicators. 



Page 1.10 

2\ 



Chapter 2 

Description of Existing State Testing Programs 



A description cf existing state testing programs is 
presented in this chapter. OFE's procedures for obtaining 
information about programs are described; other sources of 
information about state testing programs are identified; and 
current practices are summarized. While this description may be 
of direct interest to policy makers and practitioners, its 
primary purpose with respect to this report is to establish the 
context of existing r" within which alternatives for 

linking test results -~s states must be considered. For this 
reason the discussion A state testing practices will be brief 
and will focus on information that can hopefully clarify and 
refine the consequences of the test linking alternatives. 

Procedures 

Part of the basic charge to CSE in conducting the STQI 
Project was to document existing state testing program activities 
with specific emphasis on the possibilitity of. using data already 
routinely collected to fosm "comparable" state- level achievement 
indicators. At the start of the project, federal personnel 
involved in education indicators work had only limited 
information about current state testing activities and viewed the 
project as an oppor'—nity to rectify this situation. 

To complete the compilati^ of information about state 
testing programs in the limited time allotted for the effort 
(Originally, the STQI Project was to be carried out within a 
five-month period from September 1984 through January 1985. 
However, the project did not actually begin until October 1984 
and was subsequently extended in response to changes in 
objectives arising out of the Panel meetings), it was decided to 
conduct a teleohone interview survey with representatives from 
the testing programr in each state currently conducting such a 
program. A preliminary list of contact persons in each st*te was 
obtained with the assistance of the CCSSO and the state testing 
members from the project Panel. Attempts were made to contact a 
testing representative in each state; however, this was not 
possible in some states which do not currently operate testing 
programs nor have anyone designated with responsibilities in this 
area. 

State pa rticipation. Most of the telephone interviews were 
conducted during the month of November 1984. By the end of the 
project, representatives from every state operating a statewide, 
state-administered testing program sometime during the 1983-85 
period were contacted. In total testing representatives from 42 
states were interviewed and/or supplied CSE with reports and 
documents pertaining to their state testing activities. 



7our participating states (Mississippi, which disbanded one 



Page 2.1 



ERJ.C 



22 



state testing program after 1983 and is currently implementing a 
new program; Indiana and Massachusetts, which are currently 
implementing state-administered programs for the first time; and 
Mew Hampshire, which had a program in the late 70 's and is 
beginning a new one this year) were not administering statewide 
tests as of December 1984. Eight other states (Colorado, Iowa, 
Nebraska, North Dakota, Ohio, Oklahoma, South Dakota, and 
Vermont) do not currently administer statewide tests and did not 
provide CSE with information about their testing activities. 
Some of these states are either planning to conduct statewide 
assessments or already operate programs emphasizing voluntary 
participation or local choice of tests to administer as part of 
the program. Since our interest is in programs which uniformly 
administered a statewide test, further information about the 
programs in these states was not pursued following the initial 
round of telephone calls. 

Focus of Interviews. Information about general 
characteristics of a state's testing program, the types and 
contents of reports prepared and distributed, and the 
availability of the data for further analyses beyond those the 
state included in its reports were collected during the telephone 
interview. A copy of the telephone interview guide is contained 
in Appendix 2. In addition copies of existing reports and content 
specifications generated by state testing programs were 
requested. The reports submitted by the states were used to 
clarify aspects of the information collected during the interview 
and to serve as a primary source for the examination of reporting 
practices (Chapter 5 of this report). 

In designing the instrument for gathering state testing 
program descriptions, a primary distinction was made between 
" assessment " and "competency 9 * testing programs. The actual label 
attached to a given state's testing program might vary, making 
its classification ambiguous. Assessment test results are most 
often used for general program monitoring and accountability 
within the state, primarily at the school and district levels. 
Typically, these tests cover a broad base of content and include 
items with a wide range of difficulty. Many states use 
commercially available standardized tests for their assessment 
purposes. Others develop their own tests (modeled after the 
original NAEP assessments in certain states). 

Competency testing programs, on the other hand, typically 
are intended to measure whether students have acquired a set of 
skills ( "competencies") viewed to be important for some 
educational or social purpose. Competency test results are most 
often used for decisions about grade promotion, high school 
graduation, early exit, and eligibility for remediation programs. 
The skills tested are generally drawn from a narrower content 
band than with assessment tests. "Basic skills" or "functional 
literacy" are emphasized with the expectation that most students 
at the grade level should have mastered the competencies being 
tested; hence 70 to 80 percent correct answers are usually 
established as the passing or mastery level on these tests. 



Page 2.2 



When the state testing agency administers the competency 
program itself, the competency tests are usually specially 
developed rather than off-the-shelf achievement tests from 
commercial publishers. Many states operating competency testing 
programs, however, leave the choice of content and the selection 
of mastery levels to the discretion of local school districts. 
In these cases, there is a statewide competency testing 
requirement but no statewide, state-administered testing program. 
Results from states operating local option programs cannot be 
compared (through linking) with results from other states unless 
the tests administered in different locales within the state have 
first been equated. Because of these added complications, later 
discussions regarding the number of states whose programs could 
be linked exclude local option states even though our (and 
Pipho's (1984), for that matter) tabulations of existing programs 
includes them. 

Some states operate both assessment and competency testing 
programs including a few cases where both programs are 
administered at the same grade level. During our interviews 
information about program characteristics was recorded separately 
for assessment and competency programs so that we are able to 
identify instances of multiple programs operating at a given 
grade in the same state. 

In the descriptions that follow, special attention will 
be paid to program characteristics that are likely to have the 
greatest impact on whether a state's test data can be used in 
the linking effort. Of particular interest are (a) the 
content areas teste J (reading, mathematics, writing, and 
other (typically language arts, social studies, and science)), 
(b) grade levels tested, (c) dates of test administration (Fail, 
Winter, Spring or actual month), (d) sampling strategy (census 
(every person at a grade level without a special exemption) or 
sample (a random or stratified random sample cf students or 
schools), (e) sources of test items (internally developed or 
commercially published), and (f) indications of plans for major 
program changes. 

Before proceeding with the discussion of results of our 
phone interviews, it is important to note the existence of other 
recent surveys of state testing activities. A list of other 
sources of information about these programs which we identified 
during the course of our investigation is contained in Appendix 5, 
The December 1984 reports on the current status of state 
assessment and minimum competency testing programs prepared by 
staff at the Education Commission of the States (ECS; Anderson, 
1984; Pipho, 1984) and the results from the Roeber surveys of 
testing directors are most relevant to the current effort, In 
certain instances, the results of the phone interviews were 
combined with information from these other surveys to obtain a 
presumably more accurate picture of current state testing 
activities. However, in a few cases, there are differences in 



Page 2.3 



ERJC 2i 



the information reported by the various surveys, most likely due 
to differences in when and how specific questions about program 
characteristics were asked. For the most part, discrepancies are 
minor and it should not matter which description is considered 
definitive. 

Summary of State Testing Activities 

The basic results from our examination of state testing 
activities are presented in a series of tables and a figure. The 
detailed summary of state-by-state program characteristics is 
reported in the table appearing as Appendix 6. Specific features 
of a state's assessment and competency programs are reported 
separately in this table. The pre/alence of both types of testing 
activities is portrayed pictorially in Figure 2.1. In this 
figure local option competency programs are included only when 
the state also has an assessment program. State-by- state 
information about the dates for test administration for 
assessment and competency tests is provided in Table 2.1. 
Finally, if the distinction between assessment and competency 
testing is ignored , the pattern of content areas and grade levels 
tested across the states is as depicted in Table 2.2. 

When aggregated across all states, the main characteristics 
of state testing activities can be summarized as follows: 

1. Number of Statewide Programs — As of December 1984, 
39 states (including Mississippi) were operating at least one 
statewide testing program. 

2. Assessment Programs — 35 states were conducting 
statewide assessment programs. This number includes Mississippi 
(recently discontinued) and three states (Florida, Michigan, and 
Texas) whose programs serve both assessment and competency 
purposes according to state testing officials. Other states not 
currently conducting statewide assessments (Idaho, 
Massachusetts, and South Dakota, according to the ECS survey) 
plan to start such programs in the near future. 

3. Competency Programs — 36 states currently operate 
minimum competency testing (MCT) programs; 9 of these programs 
are local option according to our survey. (Note: The December 
1984 ECS survey conducted by Pipho identified 38 states with MCT 
programs, excluding Colorado. However, his list does not match 
ours exactly. We have excluded from our list some states where 
the testing director did not classify the program as MCT even if 
Pipho did. Also, there are some states (Massachusetts, 
Nebraska, New Hampshire, Ohio, and Vermont) which operate local 
option competency programs according to Pipho but did not 
complete the CSE interview due to the absence of a statewide, 
state-administered program. 

4. Multiple Programs — 22 states operate both assessment 
and competency testing programs while 3 additional states use the 



Page 2.4 



ERIC 



25 



TABLE 2.1 



Administration Dates for State Testing Programs 



STATE 



STATE ASSESSMENT 
TEST DATES 



COMPETENCY PROGRAM 
TEST DATES 



Alabama 

Alaska 

Arkansas 

Arizona 

California 

Connecticut 

Delaware 

Florida 

Georgia 

Hawaii 

Idaho 

Illinois 

Indiana 

Kan«*s 

. ?ntucky 

Louisiana 

Maine 

Maryland 
Michigan 
Michigan 
Minnesota 

Missouri 



Montana 

ERIC 



April 

Every 2 years 1n March 

April 

April 

April - May 

? 

Marcn 



October (Grades 11 4 12) 



Spring 

Fall (September-October) 
Soring 



April 
March 



Grade 8 - FalKlate Njv.) 
Grade 4 - February 
Grade 11 - April 

Fall 

September - October 
Fall 

4 - Winter, 8 - Fall 
11 - Spring 



Fall 
April 



28 



April 



October 

? 

March (once every 2 years) 



Spring (May) 

Grade 11 - April 
Grade 8 - February 



February (Starting 1985) 
April 



March 



? 
? 

Fall 



Fall 



STATE 

Nevada 

New Jersey 
New Mexico 
New York 
Nortf: Carolina 

Oregon 

Pennsylvania 

Rhode Island 

South Carolina 

Tennessee 

Texas 

Utah 

Virginia 
Washington 

West Virginia 
W1 sconsln 
Wyoml ng 



STATE ASSESSMENT 
T £ST DATES 



March 
Spring 

First Week 1n April 

Every Four Years; 1s 
going to change to 
every year, March 

March - April 

Spring (April) 

March - April 

Spring 

February 

Fvery Three Years 1n 
Spring (m1d-Apr1l) 

Spring 

Grade 4 - October 
Grade 8 - February 
Grade 11 - Late April 

3-6 Spring, 9-11 Fall 

Spring 

Spring 



COMPETENCY PROGRAM 
TEST DATES 

Fall; Spring for Fall 
Failures 

Spring (March) 

Spring 

Spring 

Spring - Field Test 
2 x (Oct. I May) t June 
for Seniors Only 



March - April 
Fall (November) 
March - April 
? 

February 



February 



Spring 



ERIC 



Page 2.7 

2-3 



TABLE 2.2 



OVERVIEW OF CONTENT TESTED BY GRADE LEVEL 



Key 

R * reading 
M * math 
W - writing 

- * norm referenced test 





CRT's & NRT's 




CRT Major Content X Grade Level 



LIST OF STATES FOR STQI PROJECT 



Comments: 



No program 



New 85 
No program 



Districts choose - 
no statewide test 





brace i j 


uraQc *f *D 


OariA 7-Q 
ui due /~ 


Oarlp 10- 

Uf QUC XU 


ALABAMA 


(CkT\ DUM 


\l*AI ; Iwl 


(PAT) RUM 
VUA 1 / Iwl 


(CAT) RWM 
\vn i / rain 


kl A CI/ A 

ALASKA 




DM 

in 


RM 




AKIZUNA 




(C AT) 


(CAT) 


(CAT) 




RM 


(SRA) RM 


(SRA) RM 


(SRA) 


PAI TFDRNTA 


FWM 


RUM 


RMM 


RUM 


mi no v»n 












RUM 

row i 


RUM 


RWM 


RUM 


nci ALU DC 


pTOC f 1 _3 \ 
U 1 DO \ 10 / 


CTRS(4-fi) 


CTBS (7 8) 


CTBS(il) 


CI flDTHA 

rLUKXUA 


nun 


RUM 

nun 


RUM 


RUM 

roil i 


re nor TA 


DM 


RM 


RM 

Hi" 


RM 


IJMJITT 


(SAT) RUM 
\wn 1 /nun 


SAT 


SAT, DAT 


RUM(STAS) 


t nAun 
IlWIU 






RWM 




ILLINOIS 


RUM 




RWM 


RWM 


INDIANA 


RUM 


RUM 


RWM 




IOWA 










KANSAS 


RM 


RM(446) 


RM 


RM 


KENTUCKY 


CTBS-U 


CTBS-U 


CTBS-U 


CTBS-U 


LOUISIANA 


RWM(2,3) 


RWM 


RWM 


RWM 


MAINE 




RM 


RM 


RW 


MARYLAND 


(CAT) 


(CAT) 


(CAT) RWM 




MASSACHUSETTS 


RWM 


RWM 


RWM 




MICHIGAN 




RM 


RM 


RM 



ERIC 



30 



Content differs MINNESOTA 
by grade 

MISSOURI 
MONTANA 
MISSISSIPPI 
No program NEBRASKA 

NEVADA 

NEW HAMPSHIRE 
Local choice 3,6 NEW JERSEY 

Grade 11 ■ local option NEW MEXICO 

NEW YORK 

NORTH CAROLINA 
No program NORTH DAKOTA 

No program OHIO 
No program OKLAHOMA 

OREGON 

W = district choice PENNSYLVANIA 

RHODE ISLAND 
No information on CRT SOUTH CAROLINA 
No program SOUTH DAKOTA 

TENNESSEE 
TEXAS 
UTAH 

No program VERMONT 
No information on CRT VIRGINIA 

WASHINGTON 
WEST VIRGINIA 



Grade 1-3 

vW Q VJC X O 


Gradp 4-6 


Grade 7-9 


ul OIUC 


M 


RM 


RM 


RM 




RWM 




RWM 




RWM 




RWM 


RUM 


RWM 


RWM 


RWM 


SAT 

Onl 


SAT 


RWM 


RWM 

rsni 1 




DM 


RM 


RM 






RWM 




CTBS-U 


CTBS-U 


CTBS-U 


CTBS-U 


RM 


RM 


RWM 


RWM 


(CAT 1-3) 


(CAT) 


(CAT) 


RM 




RWM 


RWM 


RWM 


RM 


RWM 


RWM 


U 




A IUJ \t)U, 


1 IDO 




RMf 1-31 

ni l \ X 31 


L» i do u rsni i 


fTRS-ll RWM 
UIDO u rvnri 










RWM 


RUM 


DUM 
Knrl 


Knrl 






LI Do o 




Ul DO 




SRA 


SRA 


SRA.RM 




CAT 






CTBS-U 


CTBS-l 1 


CTBS-U 


CTBS-U 



Page 2.9 



WISCONSIN 
WYOMING 



Grade 1-3 Grade 4-6 Grade 7-9 Grade 10-12 
CTBS-U.R R CTBS-U CTBS-U.R 
NAEP NAEP NAEP 



Total nun.»r of states 
testing R.W.M 

CRT [May also do NRT] 
NRT only 
(assures all NRT Include R,W,M) 



Grade 1-3 Grade 4-6 Grade 7-9 Grade 10-12 
RWM R W K RWM RWM 



17 11 17 23 14 22 25 19 25 24 16 22 
7 7 7 12 14 13 8 10 9 6 8 7 



Page 2.10 



ERIC 



same ;st for both purposes. 18 of these states administer two 
separate statewide testing programs. 

5. Content Areas — virtually every state operating a 
program tests in the content areas of reading and mathematics. 
Less than half the states conduct writing assessments while over 
half also test in either language arts, science, social studies 
or some other area. In Chapter 4, we examine the content of state 
tests in great sr detail. 

6. Type of Test ~ 20 states report the use of one of the 
major commercially published standardized achievement tests in 
their statewide assessment or competency testing programs. In 32 
states, at least one statewide test is either internally 
developed (perhaps by an outside vendor according to state 
specifications) or involves a concurrent assessment of NAEP 
tests. 

7. Grade Levels Tested — Statewide testing programs are 
most frequently conducted in grades 8 ( 32 programs (T), 22 
assessment (A) and 10 competency (C) with 5 states conducting 
both at this grade level (B)), 11 ( 29 (T), 16 (A), 13 (C), 3 

(B) ), 3 (27 'T), 14 (A), 13 (C), 2 (B)), 4 (25 (T) , 18 (A), 7 

(C) , 3 (B)), 10 (24 (T), 12 (A), 12 (C), 4 (B)), and 6 (21 (T) , 
13 (A), 8 (C), 1 (B)). The fewest programs are conducted at 
grades 1 (8 total), 2 (11), 7 (12) and 12 (13), See Chapter 4 for 
further examination < £ grade levels tested. 

8. Dates of Test Administ ration — The majority of states 
conducting statewide testing pj.w**ams administer at least one 
test during the spring ( typically March or April). Several 
states currently conducting concurrent assessments with NAEP 
during the Fall will shift to Spring testing when NAEP does. See 
Chapter 4 for further discussion of dates of test administration. 

9. Type of Sampling — At least 24 of the 35 statewide 
assessment programs conduct census testing in most content areas. 
According to our records, all statewide competency programs test 
every eligible student at the target grade levels. 

10. Planned changes ~ Almost every state currently 
operating a testing program is planning a major change during the 
next few years (at least 36 states including those starting new 
programs, by our rough count). The most frequently mentioned 
changes are the addition of new grade levels, expansion to new 
content areas (direct writing assessments, science, social 
studies), tests of higher-order skills, change of commercial test 
used, redesign of program, revision of competency tests, 
concurrent assessment with NAEP, shift to census testing, and 
change in use of competency tests (e.g., adding a graduation 
requirement or a mastery component ) . 

The above points highlight the substantial amount of testing 
activity currently being conducted by states, while there is 
substantial variability across states in specific program 



Page 2.11 



characteristics, there is some degree of convergence on content 
areas, grade levels, and dates of test administration. At this 
somewhat superficial level, then, it appears that it would be 
feasible to pursue further the possibility of comparing test 
results (through linking and equating) from a significant number 
of states in certain content areas at certain grade levels. Of 
course, the potentially serious effects of testing conditions 
(e.g.. type of test, grade level and dates of administration 
differences) on the accuracy of the linking would have to be 
determined and taken into consideration in any comparisons. 

The other major caveat that must be considered is that state 
testing practices are obviously undergoing significant changes in 
response to state-level reform efforts. It appears likely that in 
the near future, more students will be tested at more grade 
levels in a broader array of subject matters for a greater number 
of states. If these changes actually occur as planned, there 
would be an expanded base of commonality of testing practices, 
thereby improving possibilities of using state testing data for 
comparative purposes. Whether these changes will occur, and 
programs stabilize at this higher level of compatibility, remains 
to be seen. 

While we will withhold making most of our recommendations 
until later chapters, there is at one that derives directly from 
the issues addressed here. Federal and State policy makers 
interested in the impact of state reforms will continue to need 
updated information about state testing activities. Regardless of 
whether state test data contributes to the set of national 
achievement indicators, these programs do change in response to 
reform efforts and in many cases, serve as the basis for state 
and local assessments of the impact of reforms. Under these 
circumstances, we believe that it is essential to support 
recurring collection of data about state testing activities that 
can contribute to the information base for federal and state 
(both individually and collectively) policy formation. 



Page 2.12 



34 



Chapter 3 

Consideration of Common Test Linking Strategies 



Statement of the Problem 

At the heart of the STQI Project's charge was the question 
of whether there is some feasible way of linking existing state 
tests to a common scale for state- level comparisons. The 
information about existing practices cited in the previous 
section points to the crux of the problem: even given the general 
impetus toward expanded testing, there is still substantial 
diversity in state practices that presents potential obstacles 
for a routine, straightforward linking and equating effort. The 
major potential obstacles can be summarized as follows: 

1. There are no statewide testing programs in some states. 
Eleven states (Colorado, Indiana, Iowa, Hassachussets , Nebraska, 
New Hampshire, North Dakota, Ohio, Oklahoma, South Dakota, 
Vermont) do not operate either a state-administered assessment or 
minimum competency testing program at this time. Although several 
of these states are in the process of establishing statewide 
testing programs (Colorado, Indiana, Oklahoma, South Dakota and 
Vermont are in various stages of development according to our 
sources), there is still no test to equate in some states and 
probably will not be one over the next several years. 

2. There is substantial variation among states in the focus 
of the content tested. Some states opt for broad-based 
assessments including direct writing assessments and the 
measurement of critical thinking while others concentrate on 
basic skills that all students at a given grade level are 
expected to have ("minimum competencies"); some states do both. 



3. The source of the tests used for state testing varies. 
Some states develop their own customized tests, others choose to 
administer a publisher-provided standardized achievement test, 
and still others customize a publisher-provided standardized 
test. Regardless of source, some states change either the test 
(e.g., from one publisher-provided test to another) or modify its 
content (generate new items, expand content coverage) regularly. 



4. States test at different grade levels. While testing is 
conducted in certain grades in many states (grades 8, 11, and 4, 
the grades covered by NAEP testing, are most popular), there is 
only a few grades where a majority of the states currently 
administer tests. 

5. States test at different times of the school year with 
April, March, and October the most popular months. In some states 
selected grade levels are tested in the fall while testing is 
conducted during the winter or spring at other grade levels. 



Page 3.1 



6. Some states exhaustively test all students at chosen 
grade levels while others collect data from only a sample of 
students at any one grade. 

Obviously, if the development of an achievement indicator 
for comparing states requires that all states test a comparable 
sample of students on equivalent content at the same grade levels 
at the same time of year , it would be impossible to meet the 
conditions necessary to establish such indicators in the short 
term. This is the case despite the professed federal and state 
interests in developing a better set of achievement indicators 
and a willingness to explore state-based options as data sources. 

The short-terra picture (and presumably the long-term 
situation as well) is less dismal if it is not essential that all 
states be included in the comparisons and the other conditions 
for comparability are relaxed. The basis for relaxing the 
conditions should be that the comparison of the performance of 
states should only be made if there is sufficient empirical 
evidence to allow analytical adjustments for the effects of 
differences ±n administration conditions . Thus, even if State A 
normally tests a sample of its students using a minimum 
competency test at grade 7 in the fall and the chosen target 
grade and date for comparison is grade 8 in the spring, State A' s 
performance can be compared 4 th performance in other states if 
the effects of the difference in that state's testing conditions 
can be ascertained and a reliable and valid means for making the 
necessary adjustments is available. The effort necessary to 
obtain this evidence could be substantial, but the problems are 
more with logistics (obtaining the necessary cooperation and 
conducting the necessary special studies) and economics 
(obtaining the required funding for the special studies) than 
with technology. The methodology for generating the actual 
adjustments and incorporating them in the comparisons is well- 
established with the most difficult part being to determine all 
the conditions that need to be empirically investigated. 

In the remainder of this section, we set aside for the 
moment questions about whether all states conduct testing 
programs and substantive concerns about the actual content of 
tests in order to focus attention on the alternative analytical 
approaches for expressing the test results from different states 
on a common scale. This examination will concentrate on 
logistical details of the psychometric alternatives considered 
rather than on the psychometric details themselves. Moreover, the 
focus will be on a few alternatives that the STQI Policy and 
Technical Panel viewed to be of greatest potential interest. 

Procedures for Examining Alternative Approaches 

At the November 1984 meeting of the STQI Project Policy and 
Technical Panel, a number of alternatives were considered for 
arriving at achievement indicators from existing state testing 
activities. The summary of that meeting provides details of the 
discussion and is partially reproduced in Appendix 3. The 



Page 3,2 



3fi 



project charge following the meeting was to concentrate on 
elaborating the procedures for using equating and linking 
methodologies for arriving at a common scale for cross-state 
comparisons. Specifically, what additional new data collection 
would be necessary to apply these approaches in a substantial 
number of states and what are reasonable time and cost estimates 
for their expanded, full implementation? 

The Panel's recommendation on further examination of the 
equating and linking strategies was implemented by asking (a) 
Darrell Bock to provide a memorandum describing the psychometric 
alternatives and the conditions necessary to implement them and 
(b) other members of the Panel to react to Bock's memorandum 
prior to their April 1985 meeting (A number of Panel members 
provided written feedback following this meeting). In addition, 
CSE staff were to conduct a detailed examination of existing 
tests used by the states to provide a basis for judging whether 
there was sufficient overlap in content coverage and grade levels 
assessed among the states to actually implement any linking 
strategy of existing state tests. 

The results of these two activities (the Bock memorandum 
plus Panelists comments (See Appendix 7) and the detailed 
content analysis of existing state tests) served as a starting 
point for an extended discussion of the strengths and weaknesses 
of various alternative approaches at the April 1985 meeting of 
the Panel. At the conclusion cf the April meeting, the consensus 
among the panelists present was that 

o A pilot study of selected variations of one approach 
(the common test linking strategy) should be 
conducted in a limited set of skill areas for a 
specific grade range in order to determine both 
the quality of the equating under preferred 
conditions and the effects of various deviations 
from these conditions. 

Basic Psychometric Alternatives 

Strippei of details about the content to be scaled across 
the states, and the source of items to serve as a link, there are 
two basic psychometric alternatives for placing state test 
results on a common scale that would involve existing state tests 
(in contrast to the conduct of expanded NAEP testing): 

1. Matching scores from the test (items) chosen to serve as 
a link with existing state test scores ( matched test 

data ) 

2. Concurrent administration of the linking test and the 
existing state test ( common anchor items ) 

Both alternatives would require that a "common linking test" be 
administered within participating states to a sample of students 



Page 3.3 



ERLC 



3/ 



and schools of sufficient size to carry out the desired equating 
to a common scale. 



Matched test data . The matched test data strategy would 
require that within a participating state, a sample of pupils be 
identified whose item responses to both the common linking test 
and the state test to be scaled could be matched. These two 
tests need not be administered at the same time within the state, 
but the ability to match at the item level for pupils is 
essential. 

If NAEP were to serve as the common linking test, this 
matching would entail using the sampled schools' rosters of 
students taking NAEP to link student data from the NAEP public 
use tapes with the data for corresponding students from the state 
testing program. Once a sample satisfying the matching conditions 
has been obtained, item response theoretic (IRT) scaling methods 
based on marginal maximimum likelihood procedures would be 
employed to estimate item parameters for the state test using the 
parameter estimates from the common linking tests, and then the 
estimated item parameters for items from the state tests would be 
used to compute scores for pupils in the state samples. (The 
Bock memorandum describes the essential technical features for 
the scaling but the reader is referred to two other Bock 
references [Bock & Aitkin, 1981; Bock & Mislevy, 1982] for more 
complete specification of the psychometric basis for the 
scaling.) The resulting pupil scores (and hence their weighted 
or unweighted averages) are expressed on a scale that will be 
comparable to the scales for other states who use the common 
linking test. 

There are several critical logistical matters that are 
essential to attempts to employ the matched test data strategy. 
Possible difficulties in obtaining enough pupils in a 
participating state who could potentially be matched and in 
securing the local school site cooperation and support for 
carrying out the physical matching are the most salient 
questions. According to ETS sources, only seven states 
(California, Florida, Illinois, Massachusetts, Michigan, New York 
and Texas) have as many as 1000 students taking NAEP as part of 
its standard sample. In addition, there are other states 
(Connecticut, Minnesota, Wyoming) who participate in a concurrent 
assessment using NAEP items and whose results could presumably be 
directly scaled to the common scale chosen for state comparisons. 

Even in states with sufficient samples but whose state tests 
are administered at different grade levels and different times of 
the year from NAEP, states would have to arrange special 
administrations of their tests in the schools and at the grade 
levels of NAEP testing. In those states where NAEP samples are 
too small or the existing NAEP samples don't match up well with 
the schools and students sampled in the state's testing program 
(in sample as opposed to census testing states), data collection 
would have to be augmented (denser NAEP testing when the problem 
is insufficient NAEP sampling; expanded state or NAEP testing 



Page 3.4 



ERLC 



where the problem is insufficient sample match) . The costs for 
this additional testing would have to be borne by some agency. 

Under current procedures for documentation of NAEP samples, 
the roster of pupil names matched with NAEP case numbers never 
leave the local school sites. Unless the schools (or NAEP) are 
willing to provide these rosters to the state testing program, 
the actual match of student data from the two tests would have to 
be carried out by the local school 1 s personnel. This requirement 
could introduce significant noise to the data due to recording 
errors, a likely occurence under these conditions where the local 
personnel have little stake in the accuracy-of the information 
they are requested to provide (e.g., Keesling, 1985; Neigher & 
Fishman, 1985). These kinds of recording errors are not 
restricted to the NAEP situation; they can be expected to occur 
as long as the information to be recorded is of limited value to 
the persons expected to compile it. On the other hand, there 
would be no incentives to falsify information either so that 
intentional misrepresentations should not be a problem. 

There are clearly specific obstacles to using NAEP as the 
common linking test in the matched test data strategy. There are 
other alternative testing activities that are carried out in a 
sufficient number of states to warrant consideration as the 
common linking test (e.g., the SAT, ACT, ASVAB, commercially 
available standardized achievement tests). But each choice 
introduces its own set of logistical hurdles without even 
considering whether the content of the tests represented by the 
other choices is appropriate for the desired linking. 

Our analysis of the potential for the matched test data 
strategy for scaling purposes is that despite its theoretical 
promise, there are currently either insufficient data for 
matching in a significant number of states or the existing 
practices with respect to the proposed common linking test 
(whether one chooses NAEP, ACT, SAT, ASVAB, CTBS, etc.) would 
have to be modified to reduce the logistical and economic burdens 
they would entail. Moreover, there is a feasible alternative that 
takes advantage of the same psychometric methodology and requires 
substantially less effort and expense at the lower organizational 
levels of the educational system. 

Common Anchor Items . The common anchor items strategy 
requires that a set of anchor items be administered concurrently 
with all state tests that are to be linked. The same item 
response theory methods for expressing scores on a common scale 
that were described as part of the matched test data strategy are 
applicable here as well. The main distinction between the two 
strategies is that here the linking test is incorporated into the 
state's regular testing (either through embedding items or adding 
the anchor items to the beginning or end), thereby placing the 
data collection burden upon the states rather than on the local 
school sites. In those states which currently manage their own 
data collection activities, the logistics would be simplified and 



Page 3 . 5 



the reporting and recording errors would presumably be no greater 
for the anchor items * h %n they are for the state's own test. 

States that do not currently conduct an assessment could . 
choose to administer the common anchor items at the target grade 
levels and dates without the necessity of further equating and 
scaling. In states that routinely test at grade levels and times 
different from the target grades and dates , special 
administrations of the common anchor items (and preferably the 
state's own test as well) would have to be arranged along with 
the collection of the anchor items at the time of the normal 
state test data collection. These special administrations would 
be needed to pre vide the data to determine whecher there are 
grade level and date-of -testing effects that warrant adjustment. 

The methodology to be used for e<juating the s*ate * ts with 
the common linking test does net require that ill stude* , taking 
the state test also take the linking test or that all dents 
taking the linking test take the same set of i turns. The sample 
of students taking a test item from the common anchor set must be 
large enough to estimate the scaling constants for the state test 
items directly from the item responses without having to 
calculate individual student test scores (See Bock memorandum in 
Appendix 7 and referenced papers.) The important size factor is 
schools rather than students. Bock estimates that approximate'/ 
40-50 schools would have to be sampled at each grade level to 
adequately represent the population in most states for scaling 
purposes • 

The items from the common anchor set can be matrix sampled; 
that is, students could take different subsets of the test items 
from the common anchor itsm oooi. This testing design has been 
used by NAEP and many a tat to expand the sample of items from a 
specific content area and as could allow more content areas to 
be incorporated in the ancl-jr set for the same length test. This 
item sampling strategy requires more students from a give*i school 
be tested but reduces testing time in a given content area for 
any participating student. 

The remaining logistics and consequences of incorporating 
anchor items into data collection in states that develop their 
own tests is relatively straightforward. In states using 
commercially available standardized tests (The CTBf CAT, SAT, 
ITBS, and SUA tests are each used in multiple states at some 
grade levels.,, there are both potential additional constraints 
and possible economies. If a state wished to use a publisher's 
tests for its standard purposes (other than fcr the indicator 
activity), the anchor items should not be seeded within the test 
or administered at the beginning of testing because the non- 
standard administratis can affect the validity of the test 
norms. Thus the procedures for joint administration would likely 
be more limited in states using published standardized testr. At 
the same time, as long as the different states using a specific 
standardized test do so under the same conditions (same grade 
levels and time of year), it would not he necessary to estimate 



Page 3.6 



ERIC 



40 



the scaling constants anew for every state, at least for 
technical reasons. 



Preferred Option . Our analysis suggests that the common 
anchor item strategy is preferable to the matched test data 
strategy if a common test linking approach is to be used to 
express the test scores from different states on a common scale 
for comparison purposes. The basis for the choice is primarily 
logistical; the operation could be managed by the state testing 
agency as part of its regular testing activities without 
requiring potentially extensive new assistance from local schools 
and introducing the technical complexities of carrying out the 
required matching. On virtually any other aspect of the 
technical and logistical requirements for arriving at comparably 
scaled state test results, the problems are essentially the same 
for both matched test data and common anchor item strategies. 

The common anchor item strategy places the burden for 
carrying out new testing activities on the state-level testing 
operation. The increment in effort can be large or small 
depending on how far the state's current testing programs diverge 
from the targeted testing conditions for the linking effort. This 
burden will also fall disproportionately on smaller states who 
develop their own tests and on states that change the content of 
their test frequently (new scalings are required for each new 
state item pool). If the states are to be responsible for both 
gathering anchor data and conducting the psychometric analyses 
required to express their scores on the common scale, additional 
technical expertise might be needed or a mechanism for obtaining 
technical assistance in carrying out these activities will need 
to be developed. Thus the common anchor item strategy could be 
expected to significantly impact the operation of state-level 
testing programs and increase their costs. While there would most 
likely be secondary benefits associated with the enhanced 
expertise from participation in the multi-state linking effort, 
it remains to be seen whether state testing operations will 
accrue direct benefits commensurate with their additional 
responsibilities. 

Source of Common Anchor Items 

To this point we have avoided addressing the thorny question 
of the source of the items that would s^rve as the common anchor 
for scaling the different state tests, itus is not a strictly 
technical matter since as our content analyses (see Chapter 4) 
indicated, virtually all existing state-developed tests, 
st ndardized achievement tests, as well as NAEP, contain test 
items covering some of the skill areas that would be desirable to 
include in the common anchor. But each of these choices (state- 
developed items, standard! zed tests, NAEP) have different sets of 
strengths and weaknesses affecting their suitability for 
inclusion in the anchor set. There are also other sources, 
depending on the target content areas, for the achievement 
indicators; of course, new items could be written directly to 
fill de^red content domains. Below we considei the strengths and 



Page 3.7 



ERIC 41 



weaknesses of the three main sources, explore the advisability of 
drawing upon yet other sources and provide a recommended decision 
strategy to select a source or sources for the common anchor. 

NAEP. The test items developed for and previously 
administered by NAEP represent a natural pool from which to 
select items for the linking effort. Historically, few within the 
testing community have quarreled with NAEP's item writing 
expertise. The actual NAEP test items are of high quality and 
through their inclusion in previous test administrations, have 
associated normative data about their empirical properties. In 
fact, in terms of their national representativeness, the norms 
for previously administered individual NAEP items are probably 
superior to the norms cf items from either commercially available 
standardized tests or existing state-developed test items. 

Most of the limitations that NAEP would have as a linking 
device in the matched test data strategy (periodicity of 
assessment, small state- level samples, constraints on student 
identif lability) are no longer at issue when the question is 
whether NAEP items could contribute to a common anchor set. Even 
the supposed thinness in the content sampling of certain item 
domains is of less concern as long as there are other item 
sources that could be used to augment NAEP. The one potential 
technical limitation that still could diminish the value of NAEP 
items as a source would be the lengthy time interval between 
administrations in some content areas (affecting the utility of 
the normative information from regular NAEP administrations). 

Given the availability of normative data on test items of 
good quality and the presumed credibility of NAEP to various 
stakeholders, it is sensible to include NAEP among the sources 
for the common anchor items. At the same time, there are reasons 
for incorporating items besides NAEP in the anchor set. 
Technically, some states have argued over the years that NAEP 
does not adequately reflect their own curriculum (See Roeber 
letter in Appendix 7). The evidence from our content analyses of 
existing state tests supports this contention to a certain 
degree, assuming that state tests cover only what is part of or 
should be part of the state's curriculum. There are obvious 
remedies to this presumed deficiency which we consider below. 

Political considerations are also an important element in 
the argument against using NAEP as the sole source of common 
anchor items. Despite the extensive professional and 
practitioner involvement in the development of NAEP, it is, in 
the final analysis, a federal enterprise thus raising the 
attendant concerns about a national curriculum. In fact, using 
only items from NAEP in the anchor set would make it the national 
standard for comparing states in much the same way as would a 
direct comparison of states with an expanded NAEP with 
larger state samples would. The only differences between 
expanded NAEP and the common test linking strategy with NAEP as 
whe sole source of items would be how items were selected 
(presumably some group representing states would have a major 



Page 3.8 

42 



role in item selection under the common test linking strategy), 
the added value/complications/costs associated with the equating 
and common scaling, and the distribution of logistical and 
financial burdens for conducting the data collection. 
Essentially, the states, though claiming the prerogative of 
defining the content of tests by which they would be compared, 
would be virtually abdicating to a federal entity (NAEP) the 
actual basis for comparison. While there may be short-term 
technical and political advantages to such a decision, the 
precedent it establishes may have adverse long-term consequences 
for the demarcation of federal and state roles in education 
indicator efforts. 

Commercially available standardized tests . There are several 
commercially available standardized test batteries (CTBS, CAT, 
MAT, SAT, SRA ,ITBS) that could be used as a source for the 
common anchor items. All of these tests have publisher-developed 
national norms and all sample broadly from what the publishers 
perceive to be national-consensus objectives (as determined 
primarily by textbook examinations). A significant number of 
states already use one of these tests as their state assessment 
and many districts within states who develop their own assessment 
tests also administer a standardized test for their own purposes 
(e.g., for compensatory education evaluations). 

The problems with using a standardized test as the common 
anchor have to do with matters of test selection, test security, 
and the representativeness of test norms and content. Selecting 
a single test battery from among those commercially available 
would create a marketing advantage for the selected publisher and 
would presumably entail untoward governmental intrusion into a 
competitive private enterprise, The widespread use of existing 
batteries creates te?t security problems that have led to gradual 
deterioration of the validity of these tests as measures of 
learning (as opposed to test coaching) in the past; a secure form 
of the standardized test weald be needed if were to serve as an 
anchor over time. The concerns about norm representativeness have 
to do wi:h the problems of selective school district cooperation 
in publishers' norming 3tudUs (e.g. ,Baglin, 1SS1); as a result 
none of the publishers have wruly national norms but rather 
publisher-specific norms, finally, the challenges to the 
contents of standardized tests have to do with their failure to 
incorporate important content objectives, especially at the lower 
and upper ends of the subject matter continuum. The traditional 
psychometric procedures tor standardized test development select 
highly discriminating xtems that are likely to fall in the middle 
range of difficulty; thus content known by ei her most students 
or only a few students is typically eliminate . 

These problems with commercially developed standardized 
tests argue against their uae as the single source of common 
anchor items. Whether selected items from standardized tests 
could be included as part of the anchor is unclear. Certainly, 
these tests contain items covering some of the content that 
should be included in the anchor set and there should be 



Page 3.9 



ERLC 



43 



substantial data about their actual empirical properties. But the 
tests, and hence their items, are in the private domain and 
publishers would have to be willing to cooperate in releasing 
selected items to the linking effort. Whether marketing forces 
would support or hinder such cooperation is unclear at the 
present. 

State-developed items . Our content analysis of existing 
state-developed assessment and minimum competency tests (see next 
chapter) identified a wide range of both skills assessed and the 
quality of the test items used to measure them. Some states have 
been particularly innovative and exemplary in measuring selected 
objectives? several states (primarily assessment as opposed to 
minimum competency states) devote significant portions of their 
test content to what are normally characterized as higher-order 
or higher-level skills (e.g., inferential comprehension in 
reading with passages from different subject matters, 
explanations and problem solving in mathematics). In yet other 
states, items assessing functional literacy skills are 
particularly well-developed. 

Taken as a whole, the set of items developed by states 
measure virtually every conceivable skill that one might consider 
to be pertinent to a comprehensive representation of the content 
domains of reading, mathematics, and writing. While we did not 
explicitly examine other content areas (e.g., science, social 
studies), c x sense is that testing practices in these areas are 
also of goo % quality and are as broadly representative of 
desirable content as most other sources under consideration. 

One obvious limitation of state-developed test items is the 
lack of nationally representative normative data in most 
instances. In most states, however, there is no shortage of 
evidence about the empirical behavior of items used repeatedly 
over the years of the assessment. After all, certain states 
annually test every student at a given grade level, yielding tens 
of thousands of cases for every year a test item is used. 
Moreover, just as with NAEP and with comber ically developed 
tests, the items selected for inclusion in state assessments 
undergo multiple rounds of expert and practitioner review and 
empirical examination before their actual use. In addition, some 
states have carried out studies to equate their assessments to 
commercially available tests to provide national perspectives on 
their students' performance. So while the empirical evidence from 
state-developed test items iffers from the evidence available 
on NAEP and commercially developed tests, there is no evidence of 
uniformly poor quality or lack of representativeness of important 
content and some evidence of collective broader scope. 

There are political advantages in using state-developed test 
items as a source for the common anchor items. If the common 
anchor items were chosen solely from state-developed tests, the 
the specter of a federal presence in the specification of the 
basis for state comparisons could be virtually eliminated. Any 
option for selecting the common anchor item set that includes a 



Page 3.10 



substantial state role in the specification of the content to be 
measured and significant state representation among the items 
selected would provide safeguards against perceptions of federal 
intrusion upon state prerogatives. 

There are political disadvantages as well in using state- 
developed test items as w.ie common anchor core* Without any 
• other sources of nationally normative data initially, it would 
take time to establish a basis for comparison (i.e., what are 
significant differences among states at a given point and over 
time) and efforts made to establish public credibility and 
under standing of the weaning of the comparisons. A potential 
additional trouble spot could be the uneven representation across 
states in their contribution of items to the common anchor set. 
States without assessments could not contribute at all while 
those states using commercially developed tests would have to 
obtain special permission before contributing. It is also clear 
that differences in valae preferences among states would have tc 
be overcome in arriving at consensus on which skill areas to 
include and which items to select. Just as with other 
organizations, the "not invented here" syn<irome is likely to be 
present in certain states and will have to be dealt with. 

On balance, we can see no reason flatly to exclude state- 
developed test items from the common anchor set and both 
technical and political advantages to their inclusion as a source 
along with other options. Technically, the basic evidence to 
support the inclusion of any specific state's test item in the 
anchor set should be the same as with any item from other 
sources. The sties of data collection using state-developed 
items as pa. the common anchor are no different from other 

options. Fix * lf the political advantages are potentially 

substantial while the possible political liabilities for the 
federal government are limited. 

Other sources .lt seems to us that all sources of well- 
developed test items with sufficient data about their empirical 
properties could conceivably contribute to the common anchor item 
set. There are test item banks operated by commercial vendors or 
developed by federal research laboratories .or school districts 
that could be considered. If it were deemed important and if 
necessary licensing arrangements could be made at reasonable 
costs, items developed for the ACT and SAT could be included. 
There are also special purpose testing programs (e.g.,ASVAB) 
operated by other federal and state agencies that could serve as 
sources. 

A particularly appealing source of potential items are those 
from tests used in the series of cross-national achievement 
surveys conducted under the auspices of the International 
Association for the Study of Educational Achievement (IEA). 
During the early part of the 1980' s, studies in the content areas 
of mathematics (the Second International Mathematics Study), 
science (the Second International Science Study), and writing 
(The Written Composition Study) have been conducted in over 



Page 3.11 



twenty countries. The student performance data from these studies 
is nationally representative (to a greater or less degree) in 
most countries including several of our major economic 
competitors (e.g., Japan in mathematics and science, several 
major western European countries). There appears to be 
substantial interest at both state and federal levels and from 
the private sector in international educational statistics and 
comparisons (The level of involvement of these constituencies in 
the April 1985 NCES -sponsored conference on international 
education statistics is offered in support of this inference on 
our part). The actual inclusion of selected items from these 
international studies within the common anchor would provide a 
beginning, although limited, opportunity for regularly collecting 
performance information that could be used for international as 
well as national comparison purposes. 

Preferred option . Given a decision to proceed with the 
common anchor item strategy as we have recommended, our analysis 
suggests that the items contributing to the common anchor set 
should be selected from multiple sources (NAEP, commercially 
available tests , state-developed test items, policy r elevant and 
technically adequate additional sources such as the IEA tests ) . 
There are multiple sources of items that on purely technical 
grounds could contribute to the common anchor item core. Both 
technical and political considerations lend support for selecting 
an anchor set that includes items from multiple sources, at least 
one of which is the confined pool of state-developed test items 
from existing testing programs. If properly implemented, the 
multiple sources option strikes a desirable balance among state 
and federal (and possibly private sector) contribution, among 
various normative bases for comparison once the linking has been 
established, and among forms of legitimation and credibility by 
potentially competing constituencies (the public, media, 
industry, and various groups representing education professionals 
and political interests). 

While an eclectic mixture of sources is desirable, we 
believe that the mechanisms for establishing the skills to be 
included in the core, selecting items to represent the skills and 
specifying the rules of and acceptable for participation by 
individual states should be developed and administered primarily 
by collective representation of the states (such as through the 
new CCSSO Assessment and Evaluation Coordinating Center). Given 
the traditional state rasponsibility for education, significant 
state involvement in these phases of achievement indicator 
development is essential. And, as long as legitimate federal 
needs for achievement indicators for monitoring purposes are met, 
the federal presence under this proposed operation could remain 
benign, contributing substantively at the states 'initiative and 
serving as a source of technical and economic assistance where 
appropriate. 



Page 3.12 



ERJC 



4b 



implementation Issues 

If a decision is reached to proceed to develop a state-level 
comparisons using the common anchor item strategy , what 
additional decisions would be necessary to implement the 
preferred states-coordinated development of the achievement 
indicators? This question raises the necessary implementation 
issues, both with respect to the operation of the coordination of 
the equating effort and individual state's participation in the 
comparison. We are rot attempting to substitute our judgments for 
those persons who presumably would be designated by the states to 
coordinate the effort and those individuals within states who 
would be expected to implement the activities necessary for test 
equating and scaling. Our purpose is strictly to point out some 
of the issues that the federal government , the coordinating state 
agency, and the states might consider if they choose to implement 
the proposed plan. 

1 . Documentation — Procedures for documenting contents of 
existing state tests should be specified so that questions of 
what is being equated to what can be addressed. 

2. Content Specification — Specification of content 
represented in common anchor set should be at the lowest level 
possible (subskill level) even if achievement indicators, at 
least initially, are to be reported ar. higher levels (skill or 
content area). This level of specification minimizes the 
possibility of overlooking meaningful content, maximizes the 
possibility that selected items for the common anchor will be 
scalable and unidimensionaJ , and places the greatest constraints 
on agreement about content assignment. 

3 . Criteria for Item Consideration — The minimum criteria for 
considering an item for inclusion in the common anchor item set 
should be that 

o The item should measure a skill that should be 
represented in the common anchor item set, and 

o There should be sufficient empirical evidence available 
about the item to ascertain its behavior for the major 
segments of the student population with which it will 
be used. 

4. Item Selection ..edure -- The selection of items to 
represent skills in * w common anchor item set should be made by 
teams of curricul 1 ' and testing specialists from a broad-based 
pool of items with as little identification information as to 
source as is technically feasible (to guard against political and 
social biases in selection). Empirical data should initially be 
provided without the identifying features of norm source. In 
later phases, additional technical information about norm quality 
should be considered if too many items are acceptable by other 
judgmental criteria. 



Page J. 13 



5 . Testing Conditions Specifications — The following set of 
testing conditions should be specified: 

o Target grades and range of testing dates should be 
specified along with requirements for special studies 
in those states who normally test outside the chosen 
range or do not test at present but decide to 
participate. 

o Procedures for concurrent administration of the common 
anchor item set with existing state test should be 
specified for the various alternative types of state 
tests (matrix sampled, state-developed single form, 
commercially developed standardized test). 

o Auxiliary information for checking subgroup bias and 
determining sample representativeness (for equating and 
scaling purposes) should be specified* 

o Minimum sample sizes (for both schools and students) 
should be established. 

6. Pilot Study of Testing Conditions A design for a pilot 
study of effects of deviations from target testing conditions 
should be developed. 

Our remaining recommendations regarding the implementation 
of the common test linking strategy have to do with the 
establishment of an effective political, institutional, and 
economic environment for this indicator effort. First, it will 
be a serious matter to develop the necessary levels of 
political support for this activity. Key participants are, of 
course, the Chief State School Officers, their staffs, and other 
State education officials, but other prominent State officials, 
including the Governor, Members of Congress, and State 
legislators may need to be involved. Representation of members 
of large city school districts, the education associations and 
from the private sector should be participants as 
appropriate. Broad based support for the idea should be 
developed. 

Second, the matter of developing an institutional structure 
for the conduct of this activity should be considered. The 
benefit of having an organization of States manage the process 
will avoid the specter of Federal directive, and the Council of 
Chief State School Officers' Assessment and Evaluation 
Coordinating Center proposal deserves consideration for this 
purpose. 

Third, it is essential that technical assistance and 
oversight be established to assure the quality of technical and 
methodological operation of the linking and equating, of the 
content of measures, and of validity of interpretations. This 
oversight should be provided by a panel, perhaps modeled on the 
panels advising the NAEP activity. 



Page 3.14 



48 



Fourth, a long-term, secure basis of financial support for 
this activity should be assured. The costs will not ne high but 
resources should be regularly available. 



Summary and Recommendations 

In this chapter we considered directly the alternatives for 
linking existing state tests to a common scale for state-level 
comparisons. The existing testing conditions in states that might 
aid or hinder the linking effort were discussed. The relative 
merits of two psychometric alternatives (a matched test data 
strategy and a common anchor item set) for linking state tests 
through equating to a common scale were considered in detail. 
Possible sources of items to serve as the common link were 
identified and evaluated. Implementation issues that should be 
addressed if a decision were made to proceed with the linking 
effort were deliaeated. 

The primary recommendation was that the test linking 
strategy be tried on an exploratory basis (for perhaps a two-year 
period) after which judgments about continuation, modification, 
or expansion could be made. The guiding features of this 
exploration should be that 

o The comparison of the performance of states should only 
be made if there is sufficient empirical evidence to allow 
analytical adjustments for the effects of differences in 
administration conditions. The exploratory study should generate 
this needed empirical evidence. 

o The common anchor item strategy, wherein a common set of 
linking items is administered concurrently with the existing 
state test to an "equating-size" sample of schools and students, 
should be used as the basis for expressing test scores from 
different states on a common scale for comparison purposes, 

o The items contributing to the common anchor set should be 
selected from multiple sources including NAEP, existing state- 
developed tests, commercially available tests, and other policy 
relevant and technically adequate sources such as the IEA tests. 



Page 3,15 



ERIC 43 



Chapter 4 

Content Analysis Of Existing state Tests 



Statement of the Problem 

Two of the recommendations for further work that were made 
at the First Panel meeting had to do with obtaining additional 
™ a ils x * bout the contents of existing state tests. Specifically, 
CSE staff were asked to proceed with the following two tasks: 

1. Conduct an examination of che content of existing state 
tests including analysis of both content specifications and 
actual items where feasible. 

< 2 : ^i 0 * 6 father the feasibility of developing summary 
indicators of trends with respect to diversity of content 
measures and complexities of skills measured. 

The impetus for these recommendations was the realization 
that there is little extant information about the specific 
content contained in state-administered tests, especially those 
that are internally developed. Several Panelists pointed out that 
not all states operating internally developed programs were 
conscientious about developing and publishing content 
SIS ion V or generation of test items. In addition, the 
match of test items to specifications and the distribution of 
items among objectives may be uneven in some states. 

The Panelists had two specific interests for urging that 
more detail information be gathered about the content of the 
state tests. First, the psychometric technology (essentially item 
response theory methods using marginal maximum likelihood 
estimation procedures) that would be used to estimate the item 
parameters needed for the equating and scaling of state tests via 
a common linking measure require that the items to be scaled form 
a homogeneous, unidiraensional set. This requirement typically 
entails that test items be scaled at the subskill (e.g., 
?f!?!?f ta J i0n ? £ ? ercent) or ski11 numbers and numeration) 

J? l< \ t ^ ic * llY 1 ' Bock calla tnis level o£ classification 
indivisible curricular elements") even when the indicator is to 

!S.,i; eP 2n;?i! t * ?? neral content area level (e.g., mathematics). 
Thus, details of the contents of the state tests are necessary 
for assigning items to homogeneous clusters suitable for linking. 

Second, the question of whether there are significant 
differences in the content tested across states is a matter of 
policy interest, in and of itself. Certainly, states administer 



Pamela Aschbacher designed and carried out the detailed content 
analyses reported in this chapter and prepared the description of 
procedures . 



procedures . 

Page 4.1 



ERIC 



60 



tests that are designed to serve different purposes (basic 
skills, minimum competencies, proficiency, critical thinking, 
higher order skills) and hence presumably cover different 
content. Given the widespread interest in strengthening the 
curriculum across the states, and the explicit or implicit 
relationship between what's tested and what's taught, questions 
about the diversity of content coverage across states become 
salient. This is especially likely if indicators of content 
coverage can be tracked over time to see their relationship to 
curricular changes and changes in test performance. 

A caveat is in order before proceeding to describe and 
discuss the results of our extended content analyses. CSE 
attempted to examine the content of state testing programs to the 
extent possible within the time and resource constraints 
governing the project. The original strategy was to sample a few 
states who developed the..r own tests and carry out an in-depth 
examination of the tests' content. 

As the task developed, however, it became clear that the 
overall goals of the project would best be served by casting the 
net as broadly as possible to cover as many states at as many 
grade levels as we could gather sufficient information to warrant 
a content examination. Moreover, we decided to examine 
commercially available standardized tests used in state testing 
programs as well (when we could obtain them) . Because the 
detailed content focus was not salient at the time of the 
telephone interviews with state test directors, we had not 
specially emphasized submission of tests and content 
specifications in our requests for reports prepared by states. 
Therefore, the availability of this type of information was spotty 
initially although we later requested additional reports from 
some states. 

Our efforts in this area mushroomed. By the time of the 
Second Panel Meeting in April, much of the detailed descriptions 
of state tests (reported in Appendices 15 through 18 along with 
the procedures for the conduct of the content analysis) had been 
completed. At that meeting, however, the Panelists devoted their 
attention to addressing the question of which option for state 
linking was most feasible and to specifying the parameters for a 
possible exploratory study of this option, while the 
results of the content analysis were of interest and useful for 
addressing the broader purposes of depicting coverage and 
facilitating the development of indicators of contpnc coverage, 
there was actually too much detail for serving the more narrow 
purpose of selecting grade levels and skills to be included in 
the exploratory study. Rather than proceeding with further 
detailed work on content coverage indicators, CSE staff, instead, 
were urged to develop simplified depictions of the results of the 
content analysis to facilitate the choice of content that would 
be piloted. 



Following the Second Panel Meeting, CSE staff worked to 



Page 4.2 



51 



respond to the modified charge in the area of content 
examination. Much of the detailed descriptions of procedures and 
results of the content analysis are contained in this report • But 
the primary emphasis in discussing the results of the analysis 
will be on the simplified data presentation, resulting 
recommendations about target content areas for the exploratory 
study, and a characterization of the implications of these 
recommendations for state participation in tt.: exploratory study. 
Further exploration of other issues is left for another study. 

Procedures 

The purpose of this part of the STQI Project was <:o examine 
the statewide testing programs in all the states in the content 
areas of reading, math, and writing in grades 1 through 12, in 
order to present a national picture of what is currently being 
done and to make policy recommendations regarding the feasibility 
of quality indicators in the area of content coverage. 

In order to accomplish this purpose, during the brief 
telephone interviews conducted by CSE staff, the directors of 
state testing programs were requested to send CSE a copy of the 
appropriate tests, manuals, technical reports, and so forth. 
(See Appendix 8 for a list of documents provided by states.) 

Tests included in the analysis were all currently used 
statewide tests given in grades 1-12 in reading, math and writing 
(including writing samples and writing skills such as 
punctuation, grammar, word usage, and organization.) The tests 
included those labeled assessment tests, minimum competency or 
proficiency exams, and inventories of basic skills. Some were 
commercially developed; others were criterion- referenced tests 
developed by state testing committees comprised of curriculum and 
evaluation specialists and teachers. 

The analysis of tests and materials proceeded in the 
following manner. The objective of the analysis was to describe 
the breadth and depth of each state's testing program in reading, 
math, and writing. In order to accomplish this, "breadth" and 
"depth" were defined, and a matrix of major-skill-areas-by- 
cognitive-hierarchy was developed for each of the 3 content areas 
(reading, math, and writing). See Appendix 9 for these matrices. 

The major skill areas and their subskills within the content 
areas were identified with the aid of several states' materials 
and three booklets: 

National Assessment of Education Progress, Reading 
Objectives 1983-84 Assessment 
National Assessment of Education Progress, Math 
Objectives 1981-82 Assessment 

National Assessment of Education Progress, Writing 
Objectives 1983-84 Assessment 



Page 4.3 



52 



Content Areas, Major Skill Ares , and Subskllls are related as 
follows : 



Content Area 
(e.g. Reading) 



I 



I 



Major Skill Area 
(e.g. Inferential Comprehension) 



I 



i 



Subsklll 
(e.g. Infer- 
main Idea) 



I 



Subsklll 
(e.g. Infer 
cause or effect) 



1 



Major Skill Area 



Subsklll 
(e.g. Infer 
author's purpose) 



Subsklll 



Subsklll 



The major skill areas in each content area follow: 

READING: (Content Area) 
Word Attack 
Vocabulary 

Literal Comprehension 
Inferential Comprehension 
Study Skills 
Attitude Toward Reading 

MATHEMATICS 

Numbers and Numeration 
Variables 6 Relationships 
Geometry 
Measurement 

Statistics 6 Probabiity 
Computers, Calculators, Technology 
Attitude Toward Math 

WRITING 

Conventions 

Grammar 

Word Usage 

Organization 

Attitude Toward writing 



Page 4.4 



Next, these lists of skills aw* their subskills (e.g. 
"identified word meaning in context" is a subskill of the skill 
area "Vocabulary) were classified according to a 4 -.level 
modification of Bloom's taxonomy of educational objectives to 
form the 3 content-by-hierarchy matrices. The 4 hierarchy levels 
included in this study were: recall, routine r anipulation of 
literal comprehension; inference, translation, explanation, or 
judgement; and application, problem solving. 

The materials for each state were carefully examined to 
classify the test items according to the content-by-hierarchy 
matrices. In some s;a*:es, more than one test was used, so all 
tests of the « ^evai** content we~e analyzed. The number of test 
items for ear ,skiJl in the matrices were recorded fox ach 
test at each - s -a 'evel. For writing samples, the number and 
type of writii. \i.. r les wer^ recorded together with information 
about the type scoring system used. 

The materials received from the states varied greatly in 
scope and detail provided. Where actual teacs were provided, they 
served as the pvimary source of data. In oth~r cases, manuals or 
reports had to be relied upon to provide the information. At one 
end of the continuum were reports that made vague mention of a few 
of the skills tested but gave no comprehensi . w list of skills or 
detail* on how many items of each were used. At the other end 
were reports that included complete test specifications with 
detailed descriptions of objectives, skills, sample test items, 
and number of such items on the tests by grade level. 

For each state, CSE staff attempted to extract the most 
specific leveJ of data possible. Hence, for some states it was 
only possible to indicate that certain subskills were indeed 
tested without any indication of the number of such items on the 
test. For others, it was difficult to match their descriptions of 
the test content with the matrices of subskills for several 
reasons, often because some of the test reports lumped several 
different subskills together with only a total number of test 
items specified or because the reports gave overly brief 
descriptions of the skills tested (e.g. main idea" did not 
specify whether the student had to identify an implicitly stated 
main idea or infer it from the passage. ) A list of decision 
rules was generated to guide the content analysis and 
summarization in these situations, and a 6-point ratirg system 
was developed to describe the level of specificity of the 
information sources, (see Appendice 10, 11 & 12) Appendix 13 
contains sample items for each cell of the Hath, Reading, and 
Writing matrices for which at least one state had test items. 

An attempt was made to analyze all commercial , norm- 
referenced tests used by several of the states. Specimen Sets 
were ordered directly from the publisher. Unfortunately, not all 
commercial tests were rece 4 ***d in time to be analyzed for this 
study. However, those incited do provide a kind of sample of 
what such tests typically include. 



Page 4.5 



9 

ERLC 



54 



After each state's materials had been examined, the data 
were summarized for each content area for 4 grade groupings: 
grades 1-3 , 4-6 , 7-9 , 10-12. These summaries included the total 
number of test iten.4 and the number of different subskills tested 
in each ma jor-skiil-area-by-hierarchy- level cell in the matrix. 
For reading and writing, the number of cells for which test items 
occurred was relatively small (6 different cells). However, for 
;<tath, the number of cells was larger, so the summary was done 
slightly differently. Numbers of items and subskills tested were 
summarized separately for each of 5 major skil. „eas and 4 
hierarchical levels rather than the 20 different cells that would 
have resulted from crossing these axes. This method provided a 
relatively simple picture while still indicating the breadth and 
depth of content and cognitive level. In addition, a separate 
20-cell math matrix of numbers of items and subskills was created 
for 5 major states at grade* 4-6 and 7-9. Included on the 
summaries is each state's information source rating, which 
provided a measure of our confidence that what /as reported is 
actually measured by the state's tests. 

For the purpose of this study, "breadth" was viewed as the 
spread of test items across major skill areas and across the 
cognitive hierarchy within a given content area. The greater the 
number of different subskills, skill areas and hierarchy levels 
at which a state has test items, the greater the "breadth." 
"Depth" was defined as the number of test items for a given 
subskill at a given level of the hierarchy. The greater the 
number of items, the greater the "depth" for that particular 
subskill. As discussed eerlier, other things being equal, 
broader tests with greater depth of coverage are considered to be 
"better". 

In adiition, lists of states were compiled for each of the 
criteria below: 

1. states with "breadth" in any content area 

2. states with ''depth" in most of the skill areas of 
reading, math, or writing 

3. states which emphasized higher order subskills 
(e.g. for reading: inferential & evaluative 

comprehension 
for math: any ccntent requiring the 3rd or 4th 
level cognitive skill: explaining, 
translating, judging, or problem 
solving) 

for writing: organization & writing sample 

4. states with items on attitude toward the content area 

5. states with writing sarc • tests, by grade level 

6. states which provided gwcd documentation of their tests 



Page 4.6 



Basic Results 



The detailed content examination of state tests is provided 
in Appendices IS - 18. (The key for interpreting these detailed 
sumnaries appears as Appendix 14.) These tables do depict the 
diversity of emphasis among the states in the material chosen for 
statewide testing. Seme states sample broadly across skill areas 
with many subskills and many items per subskill (e.g., 2S0 items 
covering 30 subskills, typically matrix sampled); others measure 
many mibskills with only a few items (e.g., SO items covering 20 
subskills); while still others test only a few subskill areas 
with lots of items (e.g., 80 items covering 8 subskills). Later 
in this chapter, we provide selected examples of what we view to 
be exemplary practice from the perspective of a broad-based, 
in-depth, balanced distribution of content with significant 
sampling of higher order skills. 

For the present, however, we seek a simpler depiction of 
coverage for the purposes of selecting skills to concentrate on 
in an exploratory study. To accomplish this task, the content 
reported in Appendices 15 - 17 was used to develop state-by-skill 
area matrices for reading, math, and writing at each of the four 
grade level clusters. The entries in these matrices were coded as 
follows : 

SKILL CODES 

1 = State test includes at least one test item in the skill 
area 

0 = State test does not include any items in the skill area 
blank = No State test reported at this grade level 

N = Stats tests at this grade but insufficient information 
on hand to determine what content was tested 

The 12 state-by-skill area matrices were analyzed by Sato's 
Student Problem Chart procedure (See Harnisch (1983) for a 
description). This procedure (a) reordered the states vertically 
so that those testing in the most skill areas appear first and 
those testing in the fewest skill areas appear last, and (b) 
reordered the skill areas horizontally so that those skill areas 
tested most often by states appear first and those skill areas 
tested least often by states appear last. A summary table of 
number of states testing in a given skill area (as well as other 
information not reported here) was also generated. 

The resulting matrices are reported in Tables 4.1-4.12. To 
visually simplify interpretation, a w . rt is used in place of a "1" 
when a state tested in the given skill area. Thus the meaning of 
the first row of data from Table 4. i (Reading Grades 1-3) is that 
the state-developed minimum competency test in Alabama (first 
test listed for Alabama in Appendix 15) includes items from all 
five skill areas (word attack, vocabulary, literal comprehension, 
inferential comprehension, study skills). The same holds true for 
California, Hawaii, Kansas, Nevada, South Carolina, and Texas at 
these grade levels. Twenty eight states do not test in grades 1-3 
and we have no information about Tennessee's test. None of the 



Page 4.7 



ERLC 



56 



remaining states tested in all five skill areas according to the 
taJle. 

Interpretation of skill area emphasis proceeds in * similar 
fashion. According to Table 4.1, items in the skill area of 
inferential comprehension were included in the most states (21) 
while study skills items (I) were included in the fewest (11). 
Mote that a different skill ordering can occur at other grade 
intervals. For example, word attack ski J is were tested in the 
fewest states at grades 4-6 (and other grades for that matter). 

One more feature of these tables deserves mention before 
proceeding with an examination of the results. The skill areas 
covered in some states are atypical for states testing in a given 
number of skill areas. For example, although inferential 
comprehension was the most popular skill trea, Louisiana's test 
for grades 1-3 contains no items in this skill area but tests in 
all four remaining area, riorirta's test apparently contains no 
literal comprehension items though the remaining skill areas are 
covered. When this type of analysis is applied to student test 
item responses, an atypical pattern is usually interpreted to 
reflect spotty student learning, guessing, or fundamental 
misunderstandings of certain concepts. In this present case, 
these atypical patterns could reflect a state's personalized 
curriculum emphasis, or perhaps simply the inadequacy of our 
classification efforts. We will try to note the occurence of such 
patterns as we consider the various tables. 

Reading . We will consider each grade cluster separately, 
focussing on main trends and unique patterns of coverage. The 
discussion of grades 1-3 (Table 4.1) was basically provided in 
our examples. Only 22 states even test in this grade span (note 
Alabama has 2 testing programs); those that do tend to include 
items from every area except study skills. In addition to the 
atypical patterns of testing already mentioned in Florida and 
Louisiana, Arkansas's test does not include Vocabulary items but 
tests in the remaining areas. 

There are 41 separate testing programs operating in the area 
of reading at graaus 4-6 (Table 4.2); 3 states (Alabama, South 
Carolina, and Wisconsin) maintain 2 separate programs in this 
grade span. At least 18 programs test in all 5 skill areas while 
only 11 states do not test ac all. A majority of states test in 
every skill area except word attack skills. The only apparent 
anomaly is again Arkansas's lack of coverage of vocabulary while 
testing in the remaining areas. 

In grades 7-9 (Tahle 4.3), there are 42 separate test 
administrations (and 36 states testing) in reading. At least 20 
programs test in all 5 skill areas while 12 statos did not report 
testing at this grade span as of Fall 1985 (Subsequently, 
Indiana and South Dakota have started testing in grade 8.). Cnlv 
word attack skills are tested m less than half the states while 
items on inferential and literal comprehension appear on at least 



Page 4.8 



ERIC 



5/ 



TABLE 4.1 



STATE TEST I HO PROGRAMS READING CONTENT INDICATORS 
ANALYSIS OF READING GRADES 1-3 OATEi JULY 1915 



Skills Skill 



0 ^ 



9 

ERLC 



Skills 



ststss 


Teeted 


ILVWS 


ststss 


Tsstsd 


01AL1 


5 


; . . . < . 


151 A 


0 


05CA 


5 




1MB 


0 


11HI 


5 




2 IMA 


0 


16RS 


5 




22NI 


0 


20MV 


5 




23NN 


0 


40SC 


5 




24NS 


0 


43TX 


5 




2SMO 


0 


0.AL2 


4 


0 


2SNT 


0 


03A2 


4 


0 


27NE 


0 


04AR 


4 


..0.. 


29NH 


0 


O0DE 


4 


0 


30NJ 


0 


OfFL 


4 


.0. . . 


34ND 


0 


17KY 


1 


0 


350H 


0 


10 LA 


4 


0 


360K 


0 


20HD 


4 


0 


3 70S 


0 


3 INN 


« 


....0 


39RI 


0 


33NC 


4 


0 


4 ISO 


0 


30PA 




...0. 


42TN 


0 


40WV 




...,0 


44UT 


0 


14IN 




...00 


45VT 


0 


10GA 


2 


..000 


46VA 


0 


32NY 




.0000 


47WA 


0 


02AK 


0 


I 

I 


49WI 


0 


06CO 


o 


I 


SONY 


0 


07CT 


0 


I 

I 






12ID 


o i 


I 






13IL 


° ! 


i 







Skill 
ILVWS 



•••• SKILLS STATISTICS •••• 



Ferauted 

Skill No. of 
Cods stutos 



Percent 
Testing 



I 


21 


41.2 


L 


20 


39.2 


V 


19 


37.3 


N 


11 


35.3 


S 


11 


21. S 



NOTES i 

II W ■ Word Attack 
V - Vocabulary 
L - Lltsrsl Coaprehenalon 
I ■ Infsrsntisl Coaprehenalon 
8 - Study Skills 

21 for ststss with acre than 1 teatlng prograa at a given range 
of grads lsvsl aultiple asts of codas are provided and teat a 
are^iabled by nuabers aa wall aa atats indicated <e.g., ALl, 

31 For atatea for whoa teat content specifications were not 
available at the tins of coding, the code N (no data) la 
reported in the table. 



4) 



The nuaber of ststss in s glvsn akill area Include all teat 
veraiona froa a atate and excludea atataa for whoa taat 
apecificationa were not available at the tlae of coding. 



5;i 



BEST COPV AVAILABLE 



STATE TESTING PROGRAMS READING CONTENT INDICATORS 
ANALYSIS Of READING GRADES 4-6 DATE; JULY 1985 



stataa 


Sk 
Ti 


ill* 
at«d 


Skill 
ILVSW 


StfttftR 


skills 
Taatad 


Skill 
ILVSV 


01AL1 








j 4 4 ITT 


4 


| ....0 


02AK 








| 46VA 


4 


0 


05CA 








j 47WA 


4 


0 


08DE 








i 

4IWV 


4 


...0. 


11HI 








10GA 


3 


...00 


16 ICS 








13IL 


1 


...00 


i?KY 








14IN 


3 


...00 


1 8 LA 








19NE 


3 


..0.0 


22MI 








25HO 


3 


..0.0 


2 JWJN 








49WI1 


3 


..o.o 


26HT 








1 07CT2 


2 


..000 


/•nv 








1 

1 3 2 NY 


1 


.0000 


29NH 






1 • • • 


1 

06CO 


° 




JlNM 








12ID 


° 




J /OR 








15IA 


0 




4 (fall 








21NA 


0 




40SC2 








24KS 


0 












27NE 


0 




01 AL2 






... .0 


30NJ 


0 




03AZ 






0 


34ND 


0 




04AR1 






..0.. 


350H 


o 




04AR2 






....0 


360K 


0 




07CT1 




1 


....0 


39RI 


0 I 


NNNNN 


09rL 






...0. 


41SD 


. I 


1 


20MD 




1 
1 


....0 


42TN 


• 1 


NNNNN j 


3 INC 






0 


45VT 


• 1 


NNNNN | 


3 CPA 






... .0 ! 


50WY 






r 






0 | 









ERLC 



CO 



TABLE 4.2 



•••• 8KILLT STATISTICS **** 
Farautad 



Skill No. of Mr cant 

Coda StRtOS TRStttd 



I 40 72.7 

L 39 70.9 

V 34 61.8 

* 33 60.0 

" 21 38.2 



NOTES t 

1) V - Word Attack 
V - Vocabulary 

!» ■ Literal Coaprahanalon 

X - Infarantlal Coaprahanalon 

S - Study Ski 11a 

2) For atataa with aora than 1 taatlng program at a giwn rang* 
of grada laval aultlpla aata of codaa ara provldad and taata 
ara lablad by nuabara aa wall aa atata lndlcatad (a.g. , AL1 , 

AL2) ■ 

3? for atataa for whoa vaat ooncant apaclf loationa vara not 
aval^abla at tha tlaa of coding, tha coda N (no data) la 
raportad in tha tabla. 

M Tha nuabar of atataa in a glvan akill aria lncluda all taat 
varalona from a atata and axcludaa atataa for whoa taat 
apaclf lcatlona wara not aval labia at tha tlaa of coding. 



BEST COPY AVAILABLE 



61 



STATS TESTING PROGRAMS READING CONTENT INDICATORS 
ANALYSIS Of READING GRADES 7 - 9 DATE I JULY 198S 



6* 



States 


Skill! 
Tested 


Skill 
ILVSW 


States 


Skills 
Tested 


Skill 

ILVSW 


01AL1 


5 




47WA 




0 


02AJC 


5 




43TX 


4 


....0 


05CA 


5 




04AR 


3 


...00 


OIDE 


5 




10GA 


3 


...00 


09PL 


5 




13IL 


3 


...00 


UKS 


5 




14IN 


3 


...00 


17EY 


5 




19NS 


3 


..0.0 


1ILA 


5 




20ND1 


3 


..0.0 


22NI 


5 




2SNV 


3 


..0.0 


23MN1 


5 




49WI1 


3 


..0.0 


23HN2 


5 




32NY 


1 


.0000 


29NH 


5 




OSCO 


0 




30NJ2 


5 




11HI 


0 


MOWN 


31NM 


5 




151A 


0 




37CR 






21NA 


0 




40SC1 


5 




2 408 


0 




40SC2 


4 




25HO 


0 




42TN 


5 




2SNT 


0 


NWNNN 


48WV 


5 




27NE 


0 




49WX2 






34ND 


0 




01AL2 




0 


350H 


0 




03AZ 




0 


360ft 


0 




07CT 




0 


39RI 


0 


NNNHM 


12ID 




0 


42SD 


0 




20ND2 




0 


44UT 


0 




30NJ1 




0 


34VT 


0 




33NC 


» 


0 


46VA 


0 


NNNHN 


UPA 


^ j 


....0 { 









e 

ERIC 



TABLE 4.3 



•*•• SKILLS STATISTICS •••• 
Permuted 



Skill No. of Percent 
Code states Tasted 



I 31 69.1 

L 37 (7.3 

V 33 60.0 

S 33 60.0 

W 20 36.4 



NOTES! 

II w - word Attack 
V ■ vocabulary 
L ■ Literal Comprehension 
X ■ Inferential Comprehension 
S - Study Skills 

21 for states with sore than 1 testing program at a given range 
of grade level Multiple sets of codes are provided aiw ssts 
are labled by numbers as well as state indicated (e.g., AL1, 



3 J for states for whon test content specifications were not 
available at the tJUse of coding, the code N (no data) la 
reported in the table. 

4) The number of states in a given skill area Include all test 
versions from a state and excludes states for whom test 
specifications were not available at the time of coding. 



I 



STATE 
ANAL 1 



8k: 

States T« 



TESTING PtOOtAHS 
SIS Or READING GRADES 

la Skill 
•d ILSVW 



01AL. 

07CT 

II LA 

2 2 HI 

23NN 

29NH 

30NJ2 

40SC 

42TN 

05CA 

omi 

11HI 

16KS 

2 6 NT 

30NJ1 

33NC 

370R 

44UT 

09rL2 

10 OA 

13IL 

19HE 

28NV 

38PA 

49WI 

25MO 

32NY 



READING CONTENT INDICATORS 
10 - 12 DATEt JULY 1985 

skill. Skill 



I 

I 

J 0 

I ..0.. 

I ....0 

I 

I 0 



0 

0 

0 

0 

0 

...00 
...00 
..0.0 
...00 
...00 
..0.0 I 
...00 J 
.0000 i 

I 

.oooo : 



ERLC 



6i 



Stataa 


Teated 


ILSVW 


02AK 


0 | 




03 AS 


o 1 

1 

! 


NNNNN 


04AR 


0 


NNNNN 


06 CO 


0 




OIDS 


0 


NNNNN 


12ID 


0 




14IN 


0 




15IA 


0 




17KY 


0 


NNNNN 


20HD 


0 




2 IMA 


0 




2 4 MS 


0 




27NE 


0 




31NM 


0 




34ND 


0 




3SOH 


0 




360* 


0 




39RI 


0 


NNNNN 


41SD 


0 




43TX 


0 




45VT 


0 




46VA 


0 


NNNNN 


47WA 


0 


NNNNN 


48WV 


0 


NNNNN 


50WY 


0 


NNNNN 



TABLE 4.4 



•**• SKILLS STATISTICS •**• 
Ferauteu 



Skill No. of Percent 

Code Statea Taated 



I 27 51. f 

L 25 48.1 

S 22 42.3 

V 20 38.5 

W 10 19.2 



NOTES i 

1) w ■ Word Attack 
V ■ Vocabulary 

L ■ Literal Coaprehenaion 

I ■ Znferentiel Coapraheneion 

S ■ Study Ski 11a 

2) rot atatea with aora than 1 taeting program at e given range 
of grade level Multiple eete of codee ere provided end teete 
ere lebled fc»y nuabere ee well ee etete indicated (e.g. , AL1, 
AL2 ) • 

3) ror atetee for whoa teet content epecif icetione wer at 
aval labia at the tiae of coding, the code N (no det. * ia 
reported in the teble. 

4) The member of etetea in a given ekill eree include ell teet 
vereione froa e etete and excludee atetee for whoa teet 
epecificetione were not available at the tiae qf coding. 



BEST COPY AVAILABLE 



6o 



37 tests. The patterns of content coverage are highly consistent 
across ail states testing during this grade span. 

Fewer testing programs are operated at grades 10-12 than at 
grades 4-6 and 7-9 (Table 4.4). There are 36 programs operating 
in 35 states, according to our data (Note that we failed to 
receive specimen sets from several commercial tests at this grade 
span.). Inferential and literal comprehension are still the most 
popular testing areas while coverage of vocabulary has dropped 
and word attack skills virtually disappeared. Again, patterns of 
content coverage are relatively uniform (some states test in 
vocabulary but not study skills). 

If only one reading skill area and grade level were to be 
included in the exploratory study, the choice apparently boils 
down to either inferential or literal comprehension at either 
grades 4-6 or grades 7-9. An examination of the detailed 
summaries in Appendix 15 (and our study files) for these grade 
spans suggest that literal comprehension is likely to be a better 
skill area for the study. The basis for this judgment is some 
indication of greater uniformity across states in subskills 
tested in literal comprehension. When we examined our earlier 
descriptions of grades tested and dates of test administration 
more carefully, it appeared that there was more uniformity of 
practice in the older grade span where spring testing in grade 8 
predominates. We return to this discussion of target grades and 
content areas later. 

Mathematics . There are only 25 separate testing programs in 
. 22 states in mathematics at grades 1-3 (Table 4.5). Most states 
operating a testing program test in the skill areas of numbers 
and numeration and measurement. According to our data, New York 
has a somewhat unusual topic coverage, skipping measurement and 
geometry but testing in statistics (the only state to do so at 
this grade span). 

In grades 4-6 (Table 4.6) : 39 testing programs in 
mathematics are administered by 36 states. At least 9 states test 
in all five skill areas and at least half the states test every 
area except statistics. Numbers and numeration and measurement 
are most frequently tested. Again, New York's apparent interest 
in statistics and lack of interest in measurement is the only 
atypical pattern. 

Forty two (42) testing programs in mathematics are 
administered by 36 states in grades 7-9 (Table 4.7). At least 34 
states test in 4 skill areas with numbers and numeration, 
measurement, and geometry the most popular. New York still 
avoids measurement at this grade span while Florida does not test 
in the geometry area. 

Just as in reading, the number of testing programs drops 
rapidly in mathematics at grades 10-12 (Table 4.8). Eighteen 
states do not administer a mathematics test at this grade span. 
Numbers and numeration is still the most popular skill area, but 



Page 4.13 



66 



ST* TESTING PROGRAMS 
ANAL* .IS Of MATH GRADES 1 



MATH CONTENT INDICATORS 
- 3 DATEs JULY 1985 



9 

ERLC 



Stat* 


Ski J 
Teal 


Lla 
cad 


Skill 
NHVOS 


Stata 


Skllla 
raatad 


01AL1 






0 


1 12ID 

1 


0 


01AL2 






0 


I i; i 


0 


03AZ 






... ,0 


1 

1 15IA 


0 


05CA 






.. ..0 


J 19ME 


0 


10GA 






... .0 


j 21NA 


0 


11HI 






... .0 


1 2?NI 


0 


14IN 






... .0 


1 

1 24HS 


0 


16KS 






... .0 


' 2 3 NO 


0 


1ALA2 






... .0 


1 

26NT 


0 


2 Of© 






... .0 


2/NE 


0 


23HN2 






. . . .c 


29NH 


0 


28KV 






... .0 


30NJ 


0 


33NC 






... .0 


34ND 


0 


38PA 






... .0 


350H 


0 


04AR 






. . .00 


360K 


0 


O0DE 






. . .00 


370R 


0 


17KY 






...00 


39RI 


0 


23HN1 






...00 


40SC 


0 


32NY 






.0.0. 


41SD 


0 


09rL 






..000 


42TN 


0 


18LA1 






1 

..000 i 


44VT 


0 


3 INK 






..ooo i 


45VT 


0 


43TX 






.0.00 ! 


46VA 


0 


<"W 






.000 j 


17WA 


u 


02AK 


0 




1 
1 


49WI 


0 


06JO 


0 




1 


50WY 


0 


07CT 


0 




1 

1 






11HI2 


0 


i 


1 

! 







Skill 
NHVOS 



NNNNN 



NNNNN 



67 



TADLE 4.5 



**** SKILLS STATISTICS 
Pensured 



Skll 1 No. of Percent 

Code stataa Tea ting 



N 23 43.4 

M 21 39. £ 

V 19 35.1 

O 13 24.5 

S 1 1.9 



NOTES' 

1) N ■ la S NuM»atlon 
V ■ Varlablaa 

O » Geoawtry 
M ■ Naaaura 
S - Statlatlca 

2) For atetee with aura than 1 teetlng program at a given range 
of grade level multiple aeta of codaa ara provided and taata 
AL*) nucabere aa wall aa stata indicated (e.g., AH, 

31 For ,. f ' who * content epecl f lea t lone were not 
available at the time of coding, the code N (no data) la 
reported in the table. 

4) The number of etatee In a given .kill area include all teet 
verelone froai a state and ear ludee atatee for whom teet 
•pec If Icatlona were not available at the tl-a of coding. 



BEST COPY AVAILABLE 



63 



STATS TESTING PROGRAMS 
ANALYSIS Of MATH GRADES 



4 



MATH CONTENT INDICATORS 
- f DATIt JULY 1985 



6J 



Stat* 


8kJ 

Tei 


Ills 
ited 


Skill 
NMQVS 


01AL1 






1 

1 


05CA 






I 


10GA 






1 

I 


UHI 






1 

1 


14IN 






1 

I 


16KS 






1 


2 5 MO 






I 


2RNV 








38?Al 








01AL2 






0 


03AZ 






....0 


07CT 






....0 


08DS 






....0 


13IL 






0 


17KY 






0 


11 LA 






....0 


20MD 






0 


23MN 






k . . .0 


26MT 






0 


31NM 






0 


33NC 






0 


38PA? 






0 


4CSC 






....o i 


43TX 




1 
1 


1 

..... 1 


44UT 




1 
1 


....0 J 


47WA 




1 

i 


1 

....0 | 


48WV 




J 


...oi 



O 

ERIC 



State 


Skills 
Tatted 


Skill 
NMQVS 


49VI 




. . ..0 


02AK 


3 


. . .00 


04AR1 


3 


. . .00 


J4AR2 


3 


. . .00 


22NI 




. . .00 


32NY 


3 


.0.0. 


46VA 


3 


. . .00 


09 PL 




. .000 


29NH 




. .000 


3 7011 




. .000 


06CO 


0 




12ID 


0 




1SIA 


0 




19NI 


0 


MNNNN 


2 IMA 


0 




24MS 


0 




« /Urn 


0 




30NJ 


0 




34ND 


0 




3SOH 


0 




360K 


1 

o 1 




39RI 


0 j 


NNNNN 


41SD 


o 1 




42TN 


1 

o 1 


MNNNN 


45VT 


1 

o 1 




50WY 


1 

o 1 





TABLE 4.6 



•••• SKILLS STATISTICS •••• 



rerouted 

Skill No. of Percent 

Code Sticeo Teating 

N i* S7.9 " 

N 35 tt.O 

0 33 82.3 

V 27 50.9 

8 9 17.0 



NOTES t 

1) N » la fc Numeration 
V - Variables 

0 ■ Oaoawtry 
N - Maaaura 
S - Statlatloa 

2) ror atataa with aura than 1 taatlng progrea at a given range 
of grade laval Multiple aata of codaa are provldad and taata 
ara lab lad by meabere aa wai: aa atata lndlcatad (e.g., AL1, 



3) Por stataa for whoa taat oontant apeclflcatlona war a not 
available at tba tlao of coding, tha coda N (no data) la 
reported In the table. 

4) The meaner of etetee in e given eklll aree include ell teet 
verelone fron a etete end excludea etetee for whoa teet 
epeolflcetlons vara not available at the tlae of coding. 



BEST COPY AVAILABLE 



ATK TSSTINO FROQRAMS MAT!* CONTtMT INDICATORS 



ANALYSIS OF HATH GRADES 7-9 



9 

ERIC 



State 

01AL1 
05CA 
07CT1 
07CT2 
07CT3 
10GA 
1ILA 
20HD1 
29KH 
30NJ2 
3IPA1 
JIPA2 
42TN 
43TX 
01AL2 
02AJC 
03A2 
1AR 
08DE 
12ID 
13IL 
16KS 
I7KY 
20KD2 
22NI 
2 3NN1 
23MN2 
2INV 



11 Skill 
ted HGMVS 



I 

...0, 
...0, 
...0 i 
.0. j 

I 

...0 
.0. 
..0 i 
.0. { 

,.o i 

..0 i 

I 

..0 i 

I 

•••I 

..0 i 
I 
I 

..0 J 



DATS I JULY 1915 

Skill Skill 
I MONVS 

..0 

..0 

0. . 

. .0 

. .0 

. .0 

..0 

0..0 

0000 

MNNNN 



State 


Tee 


30MJ1 


4 


SINN 


4 


32NY 


4 


33NC 


4 


40SC 


4 


47WA 


4 


49VI1 


4 


09FX 


3 


370* 


1 


06CO 


0 


11HI 


0 


14IK 


0 


15IA 


0 


19MB 


0 


2 IMA 


0 


24MS 


0 


25MO 


0 


26MT 


0 


27NE 


0 


34ND 


0 


iSOH 


0 


3603C 


0 


39R4 


0 


41SD 


0 


44UT 


0 


45VT 


0 


46VA 


0 


50WY 


0 



KNNKN 



NNNNN 



I 

I 

I NNNNN 



7i 



TABLE 4.7 



•••• SKILLS STATISTICS •••• 
Perauted 

Skill No. of p«rc«nt 
Coda States Tee ted 



N 34 44.3 

O 34 CO. 7 

N 34 CO. 7 

V 32 57.1 

S It 32.1 



NOTES! 

1) N • la 4 Numeration 
V * varlablea 

O * Oaoieetry 
N • Heeeure 
S • Statistic 

2) For etetee with more than 1 teetlng program at a given re nee 
of grade level Multiple aete of code a ere provided end teete 
are leblad by nuabere ee well ee etete lndlceted (e.g., AL1, 



3) For atatea for who» teat content apeel fleet lone were not 
evellable et the tlse of coding, the coda N (no data) la 
reported In the teble. 

4) The number of etetee In e given eklll eree Include ell teet 
verelone from a etete end excludee etetee for whom teet 
epeclflcetlone were not evelleble et the tie* of coding. 



BEST COPY AVAILABLE 



72 



STATE TESTING HOOKAHS MATH CONTENT IHDICATOtS 

ANALYSIS OF MATH OKADES 10 - 12 DATS f JULY IMS 

•kills Skill Skills Skill 



7 J 



Stats 


Tested 


NVOIS 


Stete 


Tested 


NVGNS 


05CA 


5 





12IO 


0 




07CT 


5 




14 III 


0 




09rLi 


5 




15IA 


0 




10GA 






17KY 


0 


.nam 


IMS 


5 




i§me 


0 




22NI 


5 




20HD 


0 




23NN 


5 




21NA 


0 




25NO 


5 




24MS 


0 




29HH 


5 




27NB 


0 




33NC 


5 




30HJ 


0 




3 1 FA 


5 




31NM 


0 




42TH 


5 




34KO 


0 




01AL 




0 


3 SON 


g 




13IL 




0 


3 (OK 


0 




18 LA 




0 


3 911 


0 


mmm 


2<KT 




0 


40SC 


0 


NNNNN 


2IKV 




0 


41SD 


0 




32NY 




...0. 


43TX 


0 




44UT 




0 


45VT 


0 




11 HI 




.00.0 


46VA 


0 


NNNNN 


09FL2 


1 1 


.0000 


47NA 


o 


mjnnn 












270R 




.0000 


4iwv 


0 




02AJC 






49WI 


0 


NNNNN 


03AZ 


0 


NNNNN 


S0WY 


0 


NNNNN 


04AR 


0 

1 


NNNNN 








0«CO 


o j 








08DE 


1 

o 1 


NNNNN 







9 

ERLC 



TABLE 4.8 



•••• SKILLS STATISTICS •••• 

Pensutsd 

skill no. of Percent 

Code stetee Testing 

» 22 43. l" 

V 19 37.3 

O 19 37.3 

« 19 37.3 

• 13 25. S 



NOTES! 

II N » fs ft Nuesrstion 
V - Vsrisblss 
O • Osoastry 
N ■ Nsssurs 
S • Ststistios 

21 Per etstss with nors than 1 tasting prog raw, at s glvsn range 
of grids level multiple ssts of codes ars providsd and tests 
srslabled by mashers ss well as atats indioatsd (e.g., AL1, 

AL2 1 • 

31 for etstss for who* tsat contsnt specifications wars not 
•vai labia at the tin* of coding, the cods N (no dstsl is 
reported in the table. 

41 The master of atstss in s givsn ekill area Include all teat 
vsrsions from s state and axoludee atates for whoa tsst 
speciliestione were not available et the tlae of coding. 



BEST COPY AVAILABLE 



the differences in emphasis anion? variables, geometry and 
measurement has disappeared. New York still excludes measurement 
while Hawaii includes it but excludes variables and geometry. 

As in reading the choices for the exploratory study are 
between two skill areas (numbers and numeration or measurement) 
at two grade spans (4-6 or 7-9). An examination of the detailed 
summaries of content coverage does not provide much guidance in 
choosing between the two topics although New York would be 
excluded if measurement were the chosen area. The choice among 
grade spans must again rely on a more detailed examination of 
testing conditions as there are 36 states administering testing 
programs in either grade span. Spring testing in grade 8 occurs 
most frequently here as it does in reecing. 

Writing . We will devote less time to the discussion of 
writing because testing in this area is less widespread than in 
mathematics or reading and the Panelists expressed less interest 
in this area for that reason. Moreover, a note of caution is 
warranted about overinterpreting the results on the prevalence of 
writing at the various grade levels. Virtually all of the content 
classified as writing comes from indirect writing assessments 
rather than from writing samples. In fact much of this content is 
what might also be called language arts (conventions or jrammar). 

Despite the increased emphasis in recent years in direct 
writing assessments, the pattern of testing in this area is still 
quite poor (Tables 4.9-4.12). Only in the areas of conventions, 
word usage, and grammar do as many as half the states test and 
even then only in the grade spans 4-6 and 7-9. Only 3 or 4 states 
include items in all five skill areas at any given gr^de span. 
The collection of writing samples occurs infrequently even at the 
higher grades with the roughly 15 states collecting .lis data at 
grades 7-9 representing the largest sample of participating 
states, with the renewed interest in critical thinking coming on 
top of the irterest in direct writing assessment, thi3 area of 
testing should continue to grow and change in the coming years. 

Exemplary Practices 

Before proceeding to the recomn.andations regarding skill 
areas and grades proposed for the exploratory study, we want to 
briefly highlight exemplary practices that emerged in our 
examination. Three different aspects of practice will be 
emphasized: spread of items across subskills, depth of coverage 
within subskills, and significant coverage of higher order 
skills. 

Signixicant numbers of states spread test items across a 
wide range of skill areas in at least one content area. The 
breadth of coverage was greatest in reading; 11 separate states 
were identified that exhibited broad coverage for at least one 
grade span. Alabama, California, Kansas, Florida, Louisiana, 
Michigan, Minnesota, New Hampshire, South Carolina, and 
Tennessee, had the most instances of tests with broad coverage. 



Page 4.18 



STATE TESTING PROGRAMS 
ANALYSIS OP WRITING 



WRITING CONTENT INDICATORS 
1 - 3 DATE i JULY 1985 



Stetes 


Skills 
Tasted 


skill 
cwoos 


States 


•KlllS 

Tsstsd 


OSCA 


< 


I 0 


2 IMA 


0 


20ND 




| 0 


22NI 


0 


43TX 


< 


I ...0. 


2 MM 


0 


01AL1 


3 


1 ..0.0 


2 4MB 


0 


01AL2 


3 


| ...00 


25MO 


0 




3 


1 ...00 


28MT 


0 


08DE 


3 


I ...00 


27NE 


0 


UHI1 


3 


! ...oo 


29NH 


0 


14IN 




.0.0. 


30*1 


0 


17KY 


3 


...00 


32NY 


0 


If LA 


3 


..00. 


34ND 


0 


26NV 


3 


...00 


J50H 


0 


J1NM 


3 


.. .00 


180R 


0 


3 3NC 


3 


...00 ; 


17011 


0 


48WV 


3 


...00 


38PA 


0 


09PL 


2 


.00.0 j 


39RI 


0 


02AK 


0 




40SC 


0 


04AR 


o 




41SD 


0 


06CO 


0 




42TN 


0 


07CT 


0 

1 




44UT 


0 


10OA 


o I 

1 




45VT 


0 


11HI2 


o I 


NNNNN | 


46VA 


0 


12ID 


\ 

o ! 

1 




47WA 


c 


13IL 


0 I 

1 




49WI 


0 


15IA 


0 1 




50WY 


0 


16KS 


o I 








19NE 


1 

0 I 


i 

i 







skill 
cwoos 



TABLE 4.9 



•••• SKILLS STATISTICS •••• 
Psrautsd 

Skill No. of Percent 
Cods ststss Tsstsd 



C 18 30.8 

N 14 28.9 

0 13 25.0 

O 4 7.7 

S J 5.8 



NOTES i 

1) C ■ Conventions (e.g., spell, cerlt., punct.J 
O ■ Orsssur (sentence structure) 

W - word Ussgs 
O ■ Organisation 
S ■ Writing Seispls 

2) Por ststss with sore than 1 testing progva» st s given rsngs 
of grede level Multiple sets of codes srs provided end tests 
ere Ubled by masher* es well et ststs indicated (e.g., AL1 , 



31 Por ststss for whom tsst content specif ice t ions wmrm dot 
eveileble *t the tisw of coding, the code rf (no data I is 
reported in the teble. 

41 The msaber of ststss in s given skill area include ell test 
versions fron s stare end excludes ststss for whoa tsst 
specifications were not eveileble et the ties o? coding. 



7/ 



BEST COPY AVAILABLE 



STATE TESTING PROGRAMS WRITING CONTINT INDICATORS 
ANALYSIS OF WRITING GRADES 4-6 DATE: JULY 1915 



statea 


Skill* 
Teatad 


Skill 
cwoos 


Stataa 


Skills 


Skill 
CWG05 


J70R 




1 


| l qui 

i i. "nil 


1 
X 


ititttii 
uuuu . 


01AL2 




I • • • .0 


1 26HT 


1 

X 


nnnn 
. uuuu 


03A2 




I ....0 

| • • • • V 


1 

32NY 


1 

X 


nnnn 
uuuu 


05CA 




1 

| n 

| • • • • V 


n 7 kit 






07CT1 




1 

1 ..0. 


uow 


u 




OIDE 




1 

! ... .0 

1 


ilHJA 


u 




17KY 




! ....0 


12ID 


Q 




20ND 


4 


! ... .0 

.... V 


l mi 


u 




J1NM 




0 


1 Me 
low 


g 




3 INC 




... .0 




o 




3IPA 


4 


... .0 


21HA 


o 




40SC1 


4 


... .0 


22NI 




nnnnff 


43TX 




...0. 


23MM 






47WA 


i 


... .0 


line 


n 




48WV 


4 | 


... .0 


27HE 


0 




49WI 


! 

4 1 


... .0 


29NH 


1 




OlALl 


3 1 

1 


. .0.0 


30NJ 


1 

0 ! 




04AM 


J < 


...00 


34ND 


1 

o ! 




07CT2 


1 

1 i 


..00. 


350H 


1 

0 1 




09FL 


J t 
1 


.0..0 


360K 


0 




11HI 


3 1 
1 


...00 


39RI 


o 1 




13IL 


3 1 


...00 


43SC2 


0 I 


NNNNN 


14IN 


1 

3 I 
1 


.0.0. 


41SH 


o i 




?8KV 


3 1 

1 


...00 


42TM 


1 

o 1 




44UT 


3 1 
1 


..0.0 


45VT 


1 

0 i 




46VA 


3 1 
1 


. . .00 


SOW/ 


o i 


NNNNN 


25HO 




.00.0 i 




1 






7a 



1 



TABLE 4.10 



•••• SKILLS STATISTICS •••• 
Pensuted 

Skill No. of Percant 

Cods stats* Tested 

C 21 52.1 

W 24 45.3 

Q 22 41.5 

O 19 35.8 

S 7 13.2 



NOTES i 

1) C - conventions (e.g., spell, ceplt., punct.) 
0 ■ Qraaaur (eentence etructure) 
W - Word Usage 
O ■ Organisation 
S - Writing Sea? la 

2 i for etetee with aore than 1 teetlng prograa at a given range 
of grede i«vel Multiple seta of codee ere provided end teets 
ere lab led by member* es well es etete indicated (e.g., AL1, 



3) for et&tee for whom teet content a peel f lcetlone were lot 
available et the tine of coding, the code N (no data) is 
r sported in the teble. 

4) The nuatber of etetee in e given eklll area include ell teet 
verelone fro* e eteto end excludee etetee for whom taet 
epeci fleet lone were not available et the tine of coding. 



BEST COPY AVAILABLE 



( J 



STATS TESTING FFOGftAMS 
ANALYSIS Of WRITING QtL 



WRITPW COKTWT INDICATOSS 
IS 7 - 9 DATS: JULY 19S5 



State 


Skill* 
Tsstsd 


skill 

CGVOS 


State 


Skill* 
Tsstsd 


07CT1 


5 


! 1 32NY 


1 


0VCT3 


5 




I 02AK 
1 


0 


3 Oh J 


5 





OIAS 

1 


0 


3 70S 






1 

t Of CO 


0 


01AL2 




0 


10GA 


0 


03AZ 




,.. 0 


11 HI 


0 


OSCA 




....0 


1 1SIA 


0 


07CT2 


* ! 


....0 


1SSS 


0 


Of DC 


: 

4 


....0 


2 MA 


0 


09FL 




....0 


22NI 


0 


1 3IL 


4 


... .0 


23*' 


0 


1 7KV 




....0 


24N8 


0 


X OLA 




...0. 


2SHO 


0 






0 


2SNT 


0 


J J - 




....0 


2 7 on 


0 


J JNC 




0 


29NH 


0 


36TA 




. . .0 


1 3*"ND 


0 






0 


I 

3«JH 


0 




1 

i 


..0 


3SOK 


0 


43TX 


« 1 


...0. 


39SI 


0 


4fWV 


i 


0 


40SC1 


0 


49WI 


< i 


....0 


41&0 


0 


C „L1 


' i 


.0. .0 


44UT 


o 


UIN 




..00. 


J5VT 


0 


12ZD 




.000. 


4*VA 


0 


19NB 




P000. 


4 TWA 


0 


20HD1 




0000. 


5uwtf 


o 1 


28NV 




J000. 







Skill 



•**• SKILLS STATISTICS •••• 



Pe raited 

Skill No. of Percent 
Coda Stataa Tasted 



c 

G 
W 

C 

s 



25 
23 
23 
21 
12 



45.5 
41. S 
41. S 
3S.2 
21. S 



NOTBSi 

II C - Convention* (e.g., epell, cepit., punct.l 
0 ■ aruair (sentence etructurel 
w - word Ueag> 
0 - Organisttiot 
S - Writing Seatu* 

21 ?Or etstsc with sort than 1 Meeting program at e givee range 
2 ttlW m • of Provided 4 ^ 2S5 

AL*| niaebere ee well ss «tate Indicated (e.g., AL1, 

3) Por etetee for whoa teet content specification* wore not 

41 The nutter of etetee in e given ekill area include all teet 
versions from a e£ete end excludes etetee for who* teet 
specifications were not available st the Uae of coding. 



STATE TESTING PROGRAMS WRITING CONTENT INDICATORS 
ANALYSIS OF WRITING GRADES 10-12 DATE) JULY 1985 



9 

ERIC 





Skills 


Skill 




Skills 


Skill 


States 


Tested 


CWOGS 


States 


Tested 


CWGOS 


07CT 
















i 


17KY 


0 




10 LA 
















i 


20MD 


0 




3 ON J 




i 














21NA 


0 




09FL 




C A 

... .0 












1 




0 


NNNNN 


1 3IL 




1 n 

! . . ..0 












I 


23NN 


0 


MNNNN 


* 5 MO 




n 

! 














24MS 


f\ 




J SPA 




• • • • " 












i 


2 7NE 


0 




4 2m 




/I 

... .0 












1 


29NH 


0 




ClAL* 


3 


• • .00 














31NM 


0 




05CA 


3 


• 0. .0 














33NC 


0 






26MT 


3 


• .0.0 














34ND 


0 




4 4UT 


3 


• • .00 














3 5 OH 


0 




i ami 




0000. 














360K 


u 








0000 . 














39RI 


0 




28NV 


1 


0000. 














40SC 


0 


NNNNN 


32NY 


1 


0000. 














41SD 


,1 




370R 




0000. 














43TX 


0 




02AK 


o 
















45VT 


o 




03AZ 


0 j 
















46VA 


o 




04AR 


o 
















47WA 


. 




06CO 


o 
















46WV 


o 


NNNNN 


08DE 


o 
















49WI 


o 


NtMNN 


XOQA 


















50WY 


o 


NNNNN 


12ID 


o 








14IN 


o 


; 






15IA 


o 


: 

1 






16RS 


o 


m 





TABLE 4.12 



•••• SKILLS STATISTICS •••• 
Permuted 

Skill No. < Percent 

Code Stetes Teeted 

C 12 24.0 

W 11 22.0 

O II 22.0 

Q 10 20.0 

S 8 16.0 



NOTKSx 

1) C ■ Conventions (e.g., spell, csplt., punct.) 
0 ■ OruMr (eentenc* jtructure) 

w ■ word usage 
O ■ Organisation 
S ■ Writing staple 

2) For ststas with sore then 1 teetlng program at a given range 
of grade level Multiple sete of codes sre provided end tests 
sre tabled by niaabers es well es stste iidlcated (e.g., ALl , 

AL2). 

?l For etetee for wheal test content specli Ice t Ions were not 
available et the time of coding, the code M (no data) Is 
reported In the teble. 

4) The number * ststes 1p a given skill sree Include ell test 
versions f. i s stste and excludes ststee for whoa test 
speclflcetlons were not available et the time of coding. 



BEST COPY AVAILABLE 



8j 



The number of states exhibiting depth of coverage (lots of 
items per subskill) in more than one content area were very few. 
California has deep coverage everywhere by our criteria while 
Alabama and Minnesota exhibited deep coverage in reading and math 
(Connecticut may have also but we did not complete the coding 
of its reading assessment). Most of the states who had broad 
coverage also managed to include a lot of test items fcr at least 
one skill area. 

The testing of higher order skills is perhaps of greatest 
interest. At least 14 states included significant numbers of 
higher order skill items on their tests. California, Connecticut, 
Illinois, Kansas, Michigan, New York, Alabama, Oregon, 
Pennsylvania, and Indiana (new test) appear to stand out in this 
area. 

Several states appear to have strong tests across the board. 
States with extensive, long-standing internally developed tests 
(e.g., California, Connecticut, Florida, Michigan, Kansas, 
Minnesota, Pennsylvania, and Illinois) tend to fare best 
according to our criteria. But there were several surprises. The 
positive jhowing of programs in Alabama, Louisiana, and South 
Carolina suggests that region of the country is not a determining 
factor in testing program quality. New York's well-respected 
testing programs do not compare favorably by our criteria but 
this could simply be lack of information on our pait. 

One other point is worth noting. Generally states who 
emphasize commercially available standardized tests do not fare 
well by the criteria we have used to characterize exemplary 
practices. Their performance may simply be underrated because we 
lacked test copies at the higher grades for most standardized 
tests. Or it could be an indication of these tests' conservative 
content strategy whon compared with the presumably more locally 
sensitive tests developed directly by states. 

Despite the somewhat rosy picture for testing of higher 
order skills in some states, most states have too little coverage 
in these skill areas to mount a broad based exploratory study. 
This is unfortunate if well -developed higher order skills are 
indeed the focus of the new curriculum reforms as it will be 
difficult to monitor the effects of reforms on these skills 
without more extensive test coverage at higher levels. 

Summary and Recommenda t i ons 

Our discussions in this chapter barely scratch the surface 
of the details of content of existing state tests and of tests 
just over the horizon in many states. Yet we have conducted by 
far the most extensive examination of the content of state tests 
to date. (Subsequent to the completion of our data collection, 
the Office of Technology Assessment contracted with Northwest 
Regional Laboratory to carry out yet another survey of state 
testing programs with an even rore detailed focus on content 



Page 4.23 



84 



coverage and changes in coverage over time. The results of that 
study are not yet available.) 

While we were unable to carry out work to the point of 
developing an explicit stc of indicators of content coverage, we 
were able to hone in on target areas and grades for the proposed 
exploratory study. After a careful examination of the test 
content data, information about grades tested and dates of test 
administration, the best candidates for the study appear to be 
the areas of literal comprehension and either numbers and 
numeration or measurement in mathematics at grades 7*9. The 
basic reasons for the content choices have already been provided 
(primarily frequency of testing at the target grades). The 
decision to focus on the san;e grade span in both content areas is 
an attempt to reduce complexity and costs and disrupt as few 
schools in a given state as possible. 

The choice of grades 7-9 over grades 4-6 is based primarily 
on the number of deviations within the grade span from the single 
most frequent grade/ te^t administration date combination. The 
grade level most frequently testing is grade 8 while the states 
testing in the grade 4-6 range are more evenly spread across 
grades • 

Table 4.16 summarizes the testing conditions of states in 
grades 7-9 as of the Spring 1985. Of the 40 states who test at 
grades 7-9 (or planning to do so soon), 25 administer their tests 
in the spring to students in the 8th grade. This leaves only 15 
states that currently test in this grade span who would have to 
either change their grades for testing, change their «ime of 
year, or do both. The other alternative for these states is to 
can out the special studies of testing conditions to estimate 
the ^ajustments necessary to align their performance with that of 
spring testing of 8th graders. There are only three states 
(Michigan, Nevada, and West Virginia) in which both grade and 
date of testing do not match the target testing conditions. 

The set of states who currently test in the proposed skill 
areas during grades 7-9 are depicted in Figure 4.1. Note that 
New York would be eliminated due to its idiosyncratic content 
coverage at this grade span, without any modifications of current 
practices, comparisons would be workable in the South, the 
Far West, the East, and the Upper Midwest. As programs in states 
just starting their own assessment begin to develop, the picture 
will be even better. For instance, the states of Wyoming, 
Indiana, and South Dakota are just starting to collect testing 
data and Mississippi is due to begin by 1987. The trend is 
clearly in the direction of more test .ng and greater conformity 
in testing practices. 



Page 4.2? 



TABLE 4.13 



7/24/85 

READING 



States w/wlcte "spread" across subskllls: 


(by grade level) 






1-3 


4-6 


7-9 


10-12 


These states 


AL 


AL 


AL 


AL 


have 2 or more 


CA 


OR 


MN 


LA 


subsklls In every 






OR 


TN 


skill area 






TN 




These have 1 or 


FL 


SC 


AL 


FL (; 


more subskllls 


KS 


AL 


CA 


MI 


In each skill 


SC 


CA 


FL 


MN 


area 


TX 


KS 


KS 


NH 






LA 


LA 


SC 






MI 


MI 


TN 






MK 


NH 








MT 


NJ (phasing out) 








NH 


SC 





2 tests combined) 



States w. ^t "depth" - i.e. most items per sitosklll 

*CA [e.g. grade 1-3, WA« 60/3, Voc - 30/2, LC * 73/3, ir 77/4, SS * 30/2] 
Ml 
MT 

NY - only on Infer word goes 1n blank (entire reading test Is this fonrat) 
States w/emphasls on higher order subskllls ("IC"): (lots of Hems and/or lou of si-bskills) 





1-3 


4-6 




7-9 




10-12 




IN 


30/6 


CA 


78/16 


CA 


235/15 


CA 


50/5 


NY 


56/1 


MI 


27/8 


IL 


10/7 


KS 


21/7 


CA 


77/4 


NY 


77/1 


IN 


35/7 


LA 


24/6 


SC 


/8 


IN 


35/7 


KS 


15/5 


MI 


26/8 








/8 


MI 


24/7 


MO 


12/6 






CT 


24/11 


CT 


/16 


MT 


20/6 



NJ 43/11 

NY 77/1 

PA 34/7 

States w/1 terns on attitude toward reading: 

Michigan (15 Items at 4-6, 7-9, 10-12) 
Momana (15 Items at 4-6, 10-12) 
Connecticut (CAEP) 



Page 4,25 



0 

ERIC 



TABLE 4.14 



7/24/85 



MATH 



States with wide spread across 5 skill areas (by gr. level): 



1-3 




4-6 




7-9 




No states Included 


AL 


have 3 or 


AL 


have 4 or 


AL 


"statistics" at 


CA 


more Items 


CA 


more Items 


CA 


this grade level. 


GA 


1n each of 


LA 


1n each of 


m 


IN 


5 skill areas 


MN 


5 skill areas 




AL have 4 or 


KS 




CT 






CA more Items 


MO 










IN In each of 


CT 




GA 


1-3 Items is 


IK 


KS other skll 1 






NH 


lowest amt. 


GA 


LA areas 






NJ 


1n any of 5 


MI 








PA 


skll 1 areas 


MO 








TX 




NH 



PA 
TN 
CT 



States with "depth" (most Items per subskllls) 
the most 



CA 
AL 
FL 



ID 
KS 
LA 
MI 
Mh 
CT 



usually 
a lot 
of Items 
1n "#s & 



Nunc ration" 



States with emphasis on higher order subskllls (3* & 4* 1n following chart) 
(lots of Items and/or lots of subskllls) 



r-3 

CA 20/4 37/5 



4-6 



7-9 



States with Items on attitude toward math: 
CONNECTICUT 

Also - only CT ..ad Items on computers and calculators plus some items 
computer literacy 1n Its lang. arts section of CAEP test. 



10-12 

4 or lore 
Items in 
each of 5 
skill aeras 



1-3 items 
is lowest 
amt. In 
any of 5 
skill areas 



10-12 





















CA 


71/7 


70/4 


AL 


8/2 


44/5 


AL 


6/2 


23/3 


MT 




28/5 


CA 100/11 


105/5 


CA 




55/6 


PA 


13/4 


17 


FL 




10/1 


FL 




40/2 


CT 


88/(5 


16/1 


IL 


2/2 


17/3 


IL 


3/1 


16/5 








KS 




9/2 


KS 




51/3 








NJ 


1/1 


19/5 














3/1 


31/4 














OR 


10/4 


15/3 


MN 


[?] 


29/3 








ct 


36/6 


20/4 


MT 
OR 

a 


"1 
3/2 


47/5 
20/2 
10/4 



on 



ERIC 



Page 4.26 



87 



TABLE 4.15 



MUTING 



7/24/85 



STATES WIiH WRITING SAMP LES : 





Gr. 1 


-3 4 - 


(new) 1 


Idaho 




2 


Ind1 ana 


X X 


3 


Louisiana 


X 


4 


Maine 


X 


5 


Nevada 




6 


New Jersey 




7 


New York 


X 


8 


Oregon 


X 


9 


Texas 


X X 


10 


Maryland 




11 


Connecticut 


X 



STATES WITH Ql£STI0NS ON 
ATTITUDES TOWARD WRITING 



Illinois 

Montana 

Connecticut 

STATES WITH "SPREAD" ACROSS 
WRITING CONTENT: 

California (esp. 1-3, 4-6 

and 7-9) 
Connecticut 
Florida 

Illinois (esp. 7-9, 10-12) 

New Jersey 

Oregon 

Pennsylvania (voluntary test) 
Tennessee 



Scoring Method* 

7-9 10 - 12 



X ? 

X H 

XX P 

X X H.P.A 

X > X H 

X > X ? 

XX H 

X X H 

X ? 

X ? 

X X H,A 

STATES WITH 
"DEPTH" 



1. California - has most items per are^ 

2. Alabama - medium amount of Items pet 

area 

HIGHER ORDER WRITING SKILLS OTHER 

THAN WRITING SAMPLE ("OR" 4 "SM" columns) 

California (Judge student writing on 
specifics) 

Connecticut (take notes; ID missing Info. 

on outline) 
Illinois -editing 1n 8th & 10th grades) 
Oregon, Alabama (fill out forms; letter 

format) 

Pennsylvania (judge relevance gr. S, 8 
a 11) 



*Scor1ng Method Key : 

H - Holistic P ■ Primary Trait 
A ■ Analytic (Diagnostic Checklist) 
? ■ Not specified 1n documents 



Page 4.27 



TABLE 4.16 



State Testing Conditions in Reading and Mathematics 
Grades 7-9 as of Spring 1985 



States Testing in Grade 8 During Spring (Feb-May) , (N=25) 



Alabama (formerly CAT, Now SAT) 

Alaska (every 2 years) 

Arizona (CAT) 

California 

Delaware (CTBS) 

Florida (every 2 years, MCT) 

Georgia (ITBS) 

Idaho (MCT) 

Illinois 

Indiana '-ginning Feb 1985, MCT) 
Kansas (MCT) 
Kentucky (CTBS) 
Missouri (MCT) 



Montana 

New Mexico (CTBS) 

New York (MCT) 

Pennsylvania 

Rhode Island (ITBS) 

South Carolina (MCT) 

South Dakota (beginning April 1985) 

Tennessee (formerly HAT) 

Virginia (SRA) 

Washington (CAT) 

Wisconsin (CTBS) 

Wyoming (NAEF) 



States Testing in Grades 7 or 9 Purina Spring (Feb-May) , (N=7) 

Arkansas (7, SRA) North Carolina (9, CAT) 

Hawaii (9, MCT) Oregon (7, every yr. 1985+) 

Louisiana (7) South Carolina (7, CTBS) 
New Jersey (9, KCT) 



States Testing in Grade 8 During Fall or Winter. (N=6) 

Connecticut (CAEP) Maryland (CAT) 

Hawaii (SAT) Minnesota 

Maine New Hampshire (MCT beg. 1985) 



States Testing in Grades 7 or 9 During Fall or winter (N=3) 

Michigan (7, MCT? ) 

Nevada (9, MCT) 

West Virginia (9, CTBS) 



No Grade 7 through 9 Testing (N=l) 
Utah 



No State Testing at any Grade (N=8 ) 

Colorado 
Iowa 

Massachusetts 
Nebraska 



North Dakota 
Ohio 
Oklahoma 
Vermont 



0 

ERIC 



Page 4.2C 



83 




WSM " states * h1dl *> NOT test for Literal 
Comprehension In Reading or Numbers 
and Numeration aod Measurement In 
Mathematics 



Chapter S 

Examination of Reporting Practices and Auxilliary Information 



S tatement of the Problem 

Within-state contrasts in achievement could be used to make 
between-state comparisons of performance. There are two types of 
within-state contrasts that could be of special interest: 

1 ) Longitudinal Contrasts which examine trends in 
achievement test scores over time. There are two types of 
longitudinal contrasts that would be of interest: 

a) Cohort repetitive trends, in which the same students 
are followed year-by-year , from grade-to-grade. For 
example, students are tested at Grade 1 in the first 
year, then followed over the years to grade 6. Some 
states do not track exactly the same students, but 
provide test information for all students at each 
successive grade level. Changes in cohort composition 
are confounded with instructional treatment when the 
data are not for the identical students at each point 
in time. When the data are for identical students, 
attrition may account for some of the observed trends. 

b) Cohort replicative trends, in which successive groups of 
students at a given grade level are tested. For 
example, fourth graders are tested each year in 
reading. Trends over time will be confounded with 
changes in the student population at the grade level(s) 
tested. 

2.) Subgroup Contrasts in which different groups within a 
state are contrasted to one another. Contrasting scores of 
students in different socio-economic status brackets, or 
contrasting the performance of different racial/ethnic groups are 
examples of contrasts within states that could form the basis of 
state-to-state comparisons. At a minimum, the definitions of the 
subgroups would have to be consistent across states in order to 
permit cross-state comparisons. Although states have federal 
models for some categories of classification (e.g., the Office 
for Civil Rights classification of race/ethnicity) , they may not 
use these consistently in their achievement testing programs. In 
areas with lesser political salience, the definitions of 
subgroups could be quite varied. 

Because longitudinal trends may be confounied with changes 
in cohort composition, the combination of subgroups and trend 
contrasts would provide basis for more accurate comparison. 
However, it is unlikely that many states will have information on 
the same subgroups (e.g., grade-level, racial/ethnic status, sex) 
tested in the sar skill areas, over time. Even if such 
information were available, it is not likely to be reported in 

J. Ward Keesling was primarily responsible for the preperation of 
this chapter. 



9 

ERIC 



Pam 5.1 



0 

ERIC 



the same metric aczosc different states. For example, in our 
examination of reports from various states, statewide test 
performance was reported using the following metrics: grade 
equivalents, percent correct, percent scoring above a specified 
passing score, stanines, percentiles, and various standard 
scores. While scores reported in some of these metrics are often 
confused with each other, none are directly comparable. Moreover, 
states seldom report the necessary distributional information 
(e.g., standard deviations of performance for each year in a 
longitudinal series or for each subgroup in the case of subgroup 
contrasts) to permit transformation of reported scores to 
standardized units (gains in standard deviation units, subgroup 
contrasts expressed as effect sizes) that might be comparable 
across states. 

A further problem with the mixture of metrics is that there 
is no absolute scale of comparison. If the data available are 
reduced to gains or subgroup contrast effects, there may be no 
way to recognize when one state is experiencing low gains or 
small subgroup contrasts due to ceiling effects, for example. 
However, even the simplest indicator (a + sign indicating gain 
vs. a - sign indicating loss) could serve, over time, as a signal 
that interesting differences were occurring. If blacks in one 
state show achievement gains from year-to-year over 4 to 5 years 
(3 to 4 differences) while blacks in a contiguous state show 
losses, no matter what the metric, tnere would be reason to 
examine the educational programs (and other factors) more 
thoroughly. 

The problems with varying metrics are not restricted to the 
reporting of achievement, states gather certain types of 
auxiliary information using different scales. Definitions of 
school characteristics such as dropout rate, ADA, and type of 
community in which the school is located, and student 
characteristic?! such as parental education and occupation are not 
measured in a uniform manner even among the few states that 
collect them. Until a greater degree of uniformity of information 
collection is attained or some other means are developed to 
alleviate the metric problems with auxiliary variables, the use 
of state-collected auxiliary information as either additional 
indicators of context, resources, processes aud outcomes or as a 
basis for subgroup classification for generating within-state 
performance contrasts will be severely limited. 

Current collection and Reporting Practices 

Setting aside concerns about possible metric differences, 
the question remains whether extant state data can be used to 
generate within-state comparisons of the kinds discussed above. 

During the telephone interviews, state testing program 
representatives were asked whether: 

(a) they report longitudinal or time trend data and 
over what period if they did; 

(b) they report achievement data for different subgroups of 
students, and how these were defined. 

Copies of state reports were examined for evidence that they 

Page 5.2 

93 



contained either trend information or subgroup results on 
achievement. The interviews and the examination of reports also 
produced data about the auxiliary information collected or 
reported as part of state testing programs. 

Table 5.1 shows the combination of subgroup and auxiliary 
information that was detected in the interviews and/ or in the 
examination of reports. It should be pointed out that most 
states used the subgrouping and auxiliary information to profile 
the composition of their student population; relationships 
between these characteristics and the achievement scores were not 
often explored. Some states collected this informacion but did 
not use it in their reports. This table may be an 
underestimation of the information available in raw form in the 
states because some data may be collected and not used in 
reports, and may also have been missed out in the interviews. 

Table 5.2 is a more focussed examination of the state-by - 
state reporting of subgroup comparisons or longitudinal trends. 
It is also based upon the interviews and examinations of the 
reports we received. Tables 5.3 and 5.4 summarize the 
information in Table 5.2. Table 5.3 shows that 27 states m our 
sample of 36 had longitudinal data for a span of at least 3 
years. Six states had no trend information, and two others had 
it, but did not report it. 

Table 5.4 shows that about one third of the states in our 
sample of 36 report no information on subgroups. Sex and 
racial/ethnic background were the most frequently used 
subgroupings . Again, one or two states collect subgroup data but 
do not report it. 

The next step in our examination of the state reports was to 
look at the specific nature of the longitudinal and subgroup 
contrasts tha* were reported to determine if they could form the 
basis of state- to-state comparisons. Because we could anticipate 
that race/ethnic background classifications might vary by state, 
it seemed prudent to focus on gander classification because it 
was frequently used and unlikely to vary by state. We chose to 
examine all states that had been cited as having both sex 
subgrouping and trend data of 3 years or uore. This led us to 
examine more closely the reports of the following 13 states: 
Arizona, CaJxfo T .ni2 , Connecticut, Louisiana, Maine, Minnesota, 
North Carolina, Pennsylvania, Rhode Island, South Carolina, 
Texas, Virginia, and Wisconsin. We focussed on the availability 
of achievement results for students i" grades 7-9, in reading or 
math. This gvade span was chosen because our analysis of the 
state testing programs had shown this to be a popular grade range 
in which to test (see Table 4.1-4.12). We looked for results 
on tests of literal comprehension in the reading area and on 
measurement or computational skills in the math area in order to 
TABLE 1 - STATE TABLE 




Page 5.3 $ t 



TABLE 5.1 



revised 7/24/B& 



Information 



Auxiliary Information Collected or Reported 
By SUte Testing Program 

States 

At AKAZARCACOCTDEFLGAHI ID IL IN IA KS KY LA t€ fti MTHnii MS MD MT ME NV NH HJ ffl MY MC NO OH OK UR PA W SC SO TM TX UT VT VA HA MV hi NT 



Students 

A. Background 

1. Chapter I 

2. Chapter I - 
Migrant 

3. Ethnic Group 

4. Free Lunch 

5. Grade Repeaters 

6. Langimge Status 

7. Years 1n 
Community 

8. Occupation 
(parent/s) 

9. Parent Educ. 

10. RAP Programs 

11. Sex 

12. Special Ed. 

13. Date of Birth 

14. Years In School 

15. Years In District 

16. Parent Support 

17. Soc-Econ Status 

18. Family Size 

B. Curriculum Expo- 
sure - General 

1. Curriculum 
Track 

2. Homework, 
Hours Spent 

3. Instruction, 
Minutes of 



xxx 



XXX 
X 



xxx 



4* 



BEST COPY AVAILABLE 

ERIC 



States coot. (2) 



Inforitlon AL AK AZ AR CA CO CT Dt a GA Hi ID IL IN IA KS KY LA * » MA MI Ml MS MO MT ME NV NH NJ Iff NY NC HO OH OK OR PA Rl SC SO TM TX OT YT VA MA NV MI IT 

. Students 

C. Student Attitudes 
1 Activities - 
General 

1. Attitude Toward 

Computers x x 

2. Attitude Toward 

School x x x x x 

3. Academic 

Self-Concept x x x 

4. Educational Plans x x x x x 

5. Career Plans x x x 

6. Talk to Parents 

About School xxx 

7. Parental 

Encouragement xxx 

8. TV xx xx xx 

9. Emotional 

Maturity x x 

10. Peer Relations x 

11. Teacher Support x 

12. Peer Support x 

13. Attr. of Success x 

14. School Climate x x x 

15. Test Anxiety x 

D. Student Attitudes I 
Activities - Reading 

1. Read Newspaper xxx ~ 

2. Read for Pleasure x x x j? 

3. Library Books for iO 
Non-School Assgnmt. x x A* 

4. How Mel 1 Student 
Feels S/he's Been 

Taught Reading x x 01 

5. Visit Reading Places x x 



er|c 



97 BEST COPY AVAILABLE 9 



Stgitgs coot. (3) 



Information ALAKARAZCACOCTffFIGAHIIDlLIMAKSKYLA^MDMAW 



Students 

0. Student Attitudes I 
Activities - Reading 

6. Request Exlri 
Reading 

7. Talk About 

Reading x x 

8. Completion of Specific 

English Courses x x 

9. Tine on Homewott In 

English x x 

10. Days of Homework In 
English x 

11. Tests I Quizzes 

In Reading x 

12. Hours/days Reading 

for Class Assignments x x 

13. Like Reading 

x 

E. Student Attitudes * 
Activities - Writing 

1. Write for flwn 

Purposes x x 

2. Write Assignments 

In English Class x x x 

3. Write Assign- 
ments In non- 
English Class x 

4. How Often Write JJ 
for School x x x x i£ 

5. Revise Writing * x x 10 



wl th Students 01 
About Their ^ 



6. Teachers Talk 
with St 

yj Writing 



ERIC 



States cont. (4) 



Inform t Ion 



AlAJCAZARCACOaOEaGAHIlDILIMIAKSKYLAfCM)Mf. MlMNHSWMTMENVMHWf«MYMCMOOH(KURPAIUSCbDTMTXUrVTVAMAWVWlir 



I. Students 

E. Student Attitudes I 
Activities - writing 
7. How Well Student 

Feels S/he's Been 

Taught Writing x x 

F. Student Attitudes I 
Activities - Math 

1. Semesters of 

Math x 

2. Completion of Spec- 

f 1c Hath courses x 

3. Tim? on Homework 

In Hath x 

4. Days of Homework 

in Math x 

5. Tests I Quizzes In 

Math x 

6. Like Math x 

G. Other Specific 
Curriculum Activities 

1. Completion of Specific 

Soc. Studies Courses x 
Z. Completion of Specific 

Foreign Language Courses x 

3. Completion of Specific *0 
Art, Music, Drama Courses x ^ 

4. Completion of Specflc n> 
Science Courses x 



5. 



Tine on Homework In 
Soc. Studies 
Tim? on Homework In 
Science 



ui 



X 



6. 



x 



BEST COPY AVAILABLE 





102 



States cont. (5) 

l!l!23!ilon AL AKAZARCACOCTDCFLGAHI P IL IN Ta KS KY LA )E MO MA MI MN MS MO MT NE NV NH NJ )ti K> NC NO OH OK OR PA Ri SC SO TN TX OT VT VA MA UV HI W 

I. Students 

G. Other Specific 

Curriculum Activities 

7. Days of Homework In 

Soc. Studies x 

8. Days of homework In 
Science x 

9. Tests I Quizzes In 

Science x 

10. Tests I Quizzes In 

Soc. Studies x 

11. Like/Favorite Subj. 

x 

II. Schools 

A. Community Context 

1. District Size x x x 

2. County/City; x 
Region/District; 

ClWParlsh x xxx x x * 

3. Urban/Suburban x « xxx 

4. Community Type x x x 

5. District Loc. x 

B. Soclo-Ecooomlc 
Characteristics 

1. AFDC x 

2. Exceptionality 

3. Mlo/ant CM Id * 

4. District SLS £ 

5. School Slze/ADA x x x x X 

6. Mobility x (D 

7. Free Lunch x 

103 C. Staff I School 
Resources 
1. Number of Pro- 
fessional suff v 



Scale s cont. (6) 



Infonwtlon ^ AZ Ak CA CO CT DE FL GA HI 10 IL IM IA KS KY LA f£ K) HA MI MN MS H9 KT NE 

II. Schools 

C. Staff I School 
Resources 

2. Avg. Pupil /Staff x x 

3. Avg. Teacher 

Salary x 

4. I Teaciiers 

With MA x x 

5. Nunber Pupils 
Tested Per 

Region x 

6. Per Capita 

Income x 

7. Avg. Ed. 

Expenditures x 

8. Per Pupil 

Expendl ares x 

9. Teacher Exper. x x 

10. Courses Offered by 
Curricula^ Field x 

0. Other 

1. Public/Private x 

2. Aosence Rate 

3. Class periods/ 
School day x 

4. X Class tine lost 

to Olsruptlon I !J 
01s traction x x <x{ 

5. I of Teachers fl> 
Pointing out 

Dangers of Drug use x 

6. Drop out Rate x co 



ERIC 105 



x 

x 



BEST COPY AVAILABLE 

106 



TABLE 5,2 
Assessment Report Contents 



State 
ALABAMA 
ALASKA 
ARIZONA 

CALIFORNIA 



Subgroup Info 
NO 

Race /Language 

Race/ Chap 1/Sex 
Language 

Sex/ Language/ 
Parent Ed level/ 
Exposure to math 



Longitudinal Info 
NO 
NO 
4 years 

4 years 



CONNECTI CUT Sex/ Communi ty / Urban 
DELAWARE NO 

FLORIDA Available, but not 

reported (NO) 

GEORGIA Free Lunch/Region/ 

LEA enrollment 

;~;daho NO 

ILLINOIS Language 

KANSAS District/Region/ 
School Enrollment 

KENTUCKY NO 

LOUISIANA Sex/Race/Soci- 

econ/City-parish 

MAINE Sex/Type of prog./ 

Language /Race/ 
[grade: on communi terns] 
Region/Community type 

MARYLAND NO 

MICHIGAN Sex 



5 years 

6 years 

(not reported) 

4 years 



4 years 

NO 

2 years 

5 years 

3 yrs 

3 years 



3-5 yrs. 

(not yet reported) 



NO 

2 years 



MINNESOTA 
MISSOURI 



Sex 
NO 



3 years 

Yes /not reported 



Page 5.9 



9 

ERLC 



107 



State 

MONTANA 

NEVADA 

NEW HAMPSHIRE 
NEW JERSEY 
NEW MEXICO 

NEW YORK 



Subg r oup Info 
NO 

Not reported 
NO 

Urbanism/Classe 

Ethnic! ty/ Language / 
Yrs of residence 

Public vs. priv./ 
Community type 



NORTH CAROLINA Sex/Ethnicity/ 

Handi cap / Horcewor Jc / 
Region/Chapter 1/ 
Parent educ. 

OREGON NO 
PENNSYLVANIA Race/Sex/District 
RHODE ISLAND Sex/SES 



SOUTH CAROLINA Sex/Race/Chap 1/ 

Free lunch/ Repeater/ 
Handicap/Gifted/District 



Longitudinal Info 
NO ' 
5 years 
NO 

7 years 

3 years 

5 years 
4-6 years 

4 yr/ 2 points 

3 years 

4 years 

4-6 



TENNESSEE 
TEXAS 

UTAH 

VIRGINIA 

WASHINGTON 
WEST VIRGINIA 



NO 

Rac e / S ex/ S ES / Spec 
ed / Program /Language / 
Region 

Student demography/ 
School sampling/strata 

NO 

Race/Sex/Ha idicap/ 
District 

Race/Chap 1/Spec 
program/District 

NO 



NO 

4 years 

6 yrs/3 pts. , 



3 years 
6 years 



3-5 yrs/ 
Not reported 

Not reported 



WISCONSIN 



Sex/Attitudes 
toward subjects 



2-8 yrs 



Page C .1Q 



9 

ERIC 



108 



TABLE 5.3 



State Reports of Longitudinal Trends 

Span of Years Reported On 

8765432NO Report 

Number of States 1256761 8 

Cumulative Number 1 3 8 14 21 27 28 36 
of States 



Notes: 

1. 27 have at loast 3 years of data they have reported on 

2. One or two don't report trends every year, even if they test 

annually - therefore time points may not be the same in 
♦umber for all these LEAs in the same category. 



Page 5.11 



109 



TABLE 5.4 

State Reports of Subgroup Information 



Subgroup Typology 



Number of States Reporting 



None 
Sex 



13 
14 
11 



Race/Ethnic background 
Region 

Language Proficiency 

Socio-Economic Status 

Community Type (e.g., urban vs. rural) 

Chapter 1 participant 

District enrollment 

Handicap 

Type of School Program (may include chap 1 

or handicap) 
Parent Education 

Reported by only one state each: 

School enrollment 

Exposure to instruction 

Years of resdence 

Public vs. Private school 

Student demography 

Homework 

Gifted 

Repeating a grade 

Attitudes toward subject matter 



TABLE NOTES: 

1. Based on 36 states with interviews or analysed reports. 

2. Category schemes with the same name may be different from 
state- to-state. 



Page 5.12 



no 



make the test content as comparable as possible. When we were 
unable to find test results on these subskills, we reported the 
results for TOTAL math or rOTAL reading instead. 

Despite our attempts to homogenize content, these can still 
be considerable variations so comparisons can only be crude at 
best, A brief synopsis of our findings for each state follows: 



ARIZONA: 

Uses CAT tests in the Spring, 



Metric: percentile. Grade 8 



Sex contrast: 1984 

Male Female 
Reading 60 61 
Math 62 65 

Longitudinal Trends: 

Cohort Replicativ e Design 
Year: 81 82 83 84 

Reading 57 59 60 60 

Math 58 61 62 64 



Cohort Repetitive Design 

Grade: 5 6 7 8 

Year: 81 82 83 84 

Reading 56 56 61 60 

Math 51 60 64 64 



CALIFORNIA: 

Uses self-made test in Spring. Metric: Score on special 
scale. Grade 8. 

Grade 8 testing began in 1984, results were not presented in 
reports available to us for review. 

COMNECTI CUT : 

Testing program modeled after NAEP: not all content areas 

are tested annually. Reports on hand did not or math. 

LOUISIANA: 

No grades tested in range 7-9. Only two time points were 
covered in the 1984 report. 

MAINE: 

Self-made test (or NAEP) given in the Spring. Metric: 
Average percent correct. Grade: B. 

Sex contrast: 1982 

Male Female 
Reading & Language Art 70.89 74.26 percent correct 

The Technical Report sent to us did not present longitudinal 
data. 



Page 5.13 Hi 



MINNESOTA: 

Test given in Fall. (Source of items not clear) 
Average percent correct. Grade: 8. 

Sex Contrast by year on TOTAL score: 
Male Female 
1977 74-5 78.8 

1981 75.0 80.1 



Metric- 



Longitudinal Trend: 



1977 



Comprehension of longer discourse 72.9 



1981 
79.4 



NORTH CAROLINA: 

CAT, given in the Spriig. Metric: Varies. 



Sex Contrast: 1984 

Comprehension 
Math Computation 



Male 

56 
56 



Female 

63 
67 



Longitudinal Trend: 

Year 81 82 82 84 

Reading total 7.8 10.1 10.1 10.1 



Grade : 



National 
Percentile 



Grade equivalent 



Metric: Mean score. Grade 8. 



PENNSYLVANIA: 

Self-made test given in Fall. 
1982 special report [school samples each year are 
volunteers, not a probability sample.] 



Sex Contrast: 

Reports available did not present ?ex contrasts 



Longitudinal Trend: 

Year: 78 79 80 81 

ReadiHg 22.0 27.0 27.1 27.6 

Math 32.0 31.8 32.0 32.5 



RHODE ISLAND: 

ITBS administered in the Spring. 
Percentile Rank. Grade: 8. 



Metric: Median 



Sex Contrast: 

Discussed in text, not tabulated. Direction or 
difference was mixed within and across grades. 

Longitudinal Trend: 

Year: 82 83 84 

Reading Comprehension 51 60 56 

Computation 52 55 60 



9 

ERLC 



Page 5.14 



112 



SOUTH CAROLINA: 

CTBS given in the Spring. Metric: Varies. Grade: 7 



Sex Contrast: 1984 

Total Reading 
Total Math 



Male 
41.6 
41.2 



Female 
46.0 
53.7 



Percent above 
the national 
median score 



Longitudinal Trend: 

Total Reading 
Total Math 



1983 
41.9 
44.5 



1984 
44.1 
51.7 



Median natn'l 
percentile 



TEXAS : 

Self-made test in the Spring. Metric: Percent mastering 
cor-tent. Grade: 9. 



Sex Contrast: 

Reading 
Math 



1983 



Male 

77 
7b 



Female 
83 
80 



Longitudinal Trend: 

80 81 82 83 

Measurement 70 69 76 79 

Total Reading 70 69 72 80 



VIRGINIA: 

SRA Achievement Series in the Spring. Metric: ? 
Grade : 8 . 

No sex contrasts were given in this report. 

Longitudinal data were given for outcomes other than test 
scores. 



WISCONSIN: 

CTBS and self-made test given in Spring. Metric: varies. 
Grade: 8. 1983 Report. 

Sex contrasts were not repor+^d in reading or math. 



Longitudinal Trends: 
Reading 



1983 
74% 



Percent correct on 
self-made test 



CTBS 

Reading 

Math 



76 
64 

72 



11 
62 

70 



78 


79 


80 


81 


82 


57 


62 


62 


62 


64 


59 


61 


66 


66 


72 



§1 
64 
70 



Natn * 1 

% 



ERIC 



Page 5.15 

113 



Analysis of these data show that reports of state testing 
programs will not >e a likely source of information on within- 
state contrasts that can be readily used to make state-to-state 
comparisons • Of the ±3 states we examined more closely, four 
produced no trend or sex contrast and skill areas of interest. 
Among the remaining nine staves, six presented sex contrast data 
and eight presented trend data. 

Gender identification was one of the most frequently 
reported characteristics by state assessment systems (about one 
quarter of all states), yet we found only six states that 
reported sex contrast 0 ' *he most frequently tested skill areas 
and grade span. We a .aed that subgroup data that are even 
roughly comparable acr ss many states will be very hard to find 
in published reports. If the raw data could be obtained, it 
might be possible to produce subgroup contrasts in more states, 
but the coverage of the nation is likely to be sparce. 

Longitudinal trend information was reported by substantially 
more state assessment systems (over half have data covering three 
years or more). However, when we constrained our exmaination to 
grades 7,8, and 9 in reading and math, only 60 percent of the 
reports gave longitudinal information. We estimate that only 15- 
20 state testing programs report trend data in reading and math 
in this grade span. In this case the archival data in the states 
could probably be used to create more within-state trends for 
comparative purposer perhaps covering a significant fraction of 
the nation. The que^cion would remain of how to interpret the 
results . 

The trend information we found revealed generally stable to 
increasing scores. It is not possible to compare rates of 
increase, given the differing metrics of the results, however. 
We don't know how valuable this information would be to state or 
national policy makers. The national trend (in recent years) 
could be inferred to be stable or rising* But this does not 
reveal what sudents have actually attained, only that they are 
attaining as well as (or slightly better than) before. If trends 
in different states were very contrastive (negative vs. positive, 
since differences in rate cannot be judged on the basis of the 
reported data) over several years, it might lead to a search for 
explanatory factors. 

The longest series, from Wisconsin, reveals the potential 
benefit of comparative data. If data from other states were also 
available for this span, it might be possible to tell whether the 
1978 "dip" in Wisconsin was unique to that state or occurred in 
more of the nation. If it was unique, a further analysis of 
events in Wisconsin might reveal a plausible cause which could be 
subject of further study, and might serve as a warring to other 
states. 

While within-state contrasts could contribute to a national 
profile of academic achievement as well as providing interesting 
comparisons among the states, the reports from state assessment 
systems do not, at present, contain enough information to make it 
possible to develop these contrasts in very m& " f states. 
Longitudinal trends are reported more often th*n subgroup 
contrasts. The data bases on which the reports are based may 
contain additional information that could make more within- state 



Page 5.16 

114 



contrasts of both types possible. If state officials could be 
persuaded that such contrasts would help them to interpret their 
assessment data, they might be encouraged to allocate more 
resources to reporting these analyses. 

Summary and Recommendations 

At "the outset we had thought that it might be possible to 
develop Consumer Report-type within-state trend and subgroup 
contrast indicators from existing state data to provide an 
alternative basis for between-state performance comparisons. Our 
analysis indicates otherwise. The degree of conformity in 
practices across states is too limited to pursue the matter 
further &t present. 

We believe, however, that the types of auxiliary information 
collected in at least some states represent valuable sources of 
data that, if broadly collected, could provide useful contextual 
information in the interpretation of state comparisons. The idea 
of making between-state comparisons of within-state longitudinal 
trends and subgroup contrasts still has merit if the information 
were available. Moreover, the existing state testing program 
annual data collection effort is an efficient vehicle to gather 
auxiliary information to expand the set of context, resource, and 
process indicators. 

If the decision is made to proceed with a States-coordinated 
effort to link existing state tests (e.g. through the CCSSO 
Assessment and Evaluation Coordinating Center), then we urge that 
the group responsible for coordinating the test linking 
activities also develop plans for obtaining a select set of 
auxiliary information on a routine basis. Thus, to encourage and 
facilitate the range and quality of information to be provided by 
states for comparative purposes, we recommend that 

o cooporating states should be encouraged to provide to 
the Coordinating Center on an annual basis uniform 
documentation describing their data collection 
activities ; 

o cooperating states should work toward the establishment 
of a common set of auxiliary information about student 
and school characteristics to collect along with their 
testing data. A standard set of definitions for 
measuring the chosen characteristics should be 
determined; and 

o as one of its activities, the Coordinating Center 

should consider ways of contextualizing the State test 
comparison data to mitigate against the possibility of 
unwarranted interpretations of comparative results. The 
auxiliary information gathered as part of the previous 
recommendation should contribute to this activity. 



O Page 5 . 17 t 1 

ERIC ^ 



Chapter 6 
Overall Summary and Recommendations 

The results of the feasibility study conducted by CSE on 
using existing data collected by States to generate state-by- 
state comparisons of student performance have been described ana 
discussed in this report. Specific chapters were devoted to 
descriptions and summaries of general characteristics of current 
state testing programs (Chapter 2), alternative approaches to 
linking test results across states to create a common scale for 
comparison purposes (Chapter 3), detailed content analyses of 
currently used state tests (Chapter 4), and the availability of 
auxiliary information about students and schools and its 
potential for use in generating within-state comparisons that 
could serve as between-state indicators of educational progress 
(Chapter 5). Etch chapter was intended to focus directly on 
particular concerns that need to be resolved prior to a major f 
effort to rely on state-developed data for comparison purposes. 

The best answer to the question of whether state-level can 
be used for state-by-state comparisons "it depends." From the 
outset we knew, and through our examinations confirmed, that 
there is a substantial amount of pertinent information collected 
by the states. The characteristics of state testing programs are 
quite diverse. While there are concentrations of testing in 
certain grades during the spring, not all states operate testing 
programs. Furthermore, the specific components of state testing 
programs are not necessarily the same over time; in fact during 
the next few years, virtually every state will cnange its testing 
activities including some states who will conduct statewide 
testing for the first time. 

For the most part, however, movements on the testing front 
are forward and expansionary, increasing the likelihood of 
overlap in testing conditions across states. Testing changes 
within states are driven by a variety of stakeholders but 
the same sets of stakeholders (legislators, governors, business 
groups, parents, universities) are participating virtually 
everywhere. If the tendency toward a common set of goals for 
state-level educational reform efforts continues, the conditions 
for cross-state comparisons of educational performance will 
improve. Right now we can say that such comparisons using state 
data are "potentially" feasible. Given likely future 
developments across the states and selected properly targeted 
studies of the effects of different testing conditions over the 
next several years, the operative adjective could shift to 
"probably"; by the end of the decad<», the answer could be 
"definitely" or "not in the foreseeable future". It is simply 
too difficult to speculate about what might come to pass given 
current state activities. 

Our response to the charge to the STQI Project has been to 
attempt to document current practice and to consider what could 
be done to improve the conditions for use of state data 



Page 6.1 



116 



for achievement comparisons. Our recommendations focus on the 
conditions that would have to exist before the data from states 
could be compared, and on the steps that would need to be taken 
to implement cross-state comparisons. In the remainder of this 
chapter, we restate the recommendations derived from cur 
investigations. The location of these recommendations within the 
earlier chapters is noted so that the reader can readily place them 
within the context of their justification and elaboration. 

Preconditions and Guiding Principles 

Several recommendations dealt with the basic conditions that 
should exist before using data from a state in performance 
comparisons and the principles that should guide the development 
of achievement indicators from state data sources. 

ISSUE: 

Which states should be included in cross-state comparisons? 
RECOMMENDATION: 

The comparison should include only those states where there 
is sufficient empirical evidence to allow analytical adjustments 
for the effects of differences in testing conditions. All states 
that collect test data on the pertinent content areas at the 
designated grade levels or whose test results can be 
statistically adjusted to the targeted testing conditions should 
be considered for inclusion in cross-state comparisons, (p. 3.2) 

ISSUE: 

What principles should guide the selection and development 
of achievement indicators derived from existing state test data? 

RECOMMENDATION: 

1. Existing state testing procedures should be disrupted as 
minimally as possible. Only those data collection activities 
considered essential for obtaining evidence of comparability 
should be introduced over and above the states' own planned 
expansions and extensions of their testing activities. 

2. Existing state tests and testing data should be used as 
much as possible. 

3. Regardless of the optimal specificity desired in the 
reporting of cross-state performance, the content of the tests to 
be used for comparisons purposes should be specified at as low a 
level (subskill or subdomain) as possible to enhance the quality 
of the match to existing tests and to encour? ;e attention to the 
content and detail of what is being tested. 

4. If the cross-state comparison are to be achieved through 
linking of a state's test to a common linking test, the content 
covered by the linking tests should be as broad as possible both 
to ensure overlap with each state's tests and to encourage 



Page 6.2 



ERLC 



117 



broadening rather than narrowing of the curriculum across the 
states. 

5. The proposed approaches for developing state-by-state 
achievement indicators should be compatible with the wider issue 
of the development of systems for monitoring instructional 
practices as well as educational progress both within and across 
the states. Desireable augmentations of current state practices 
should increase documention of student and school characteristics 
within the framework of plarned changes in state educational 
activities . ( p . 1 . 9 ) 

Proposed Approach 

At various times during the STQI Project, a number of 
approaches were considered for using equating and linking 
methodologies for placing different states' test results on a 
common scale for cross-state comparisons. The deliberations on 
these alternatives by project panelists and staff, along with 
input from other participants in panel meetings and other 
groups (e.g., CCSSO representatives), led to a recommended 
approach for linking state test results and recommendations for 
its implementation. 

ISSUE: 

What approach should be used to place state test results on 
a common scale? 

RECOMMENDATION: 

1. A common anchor item strategy, wherein a common set of 
linking test items is administered concurrently with the existing 
state test to an "equating-size" sample of schools and students, 
should be used as the basis for expressing test scores from 
different states on a common scale, (p. 3.7) 

2. The items contributing to the common anchor set should be 
selected from multiple sources including existing state-developed 
tests, NAEP, commercially available tests, and other policy 
relevant and technically adequate sources, such as the IEA tests. 

(p. 3. 12) 

ISSUE: 

What additional issues should be considered in implementing 
the desired alternative for linking state tests? 

RECOMMENDATIONS : 

1. The mechanisms for establishing the skills to be included 
in the common anchor set, for selecting items to represent the 
skills, and for specifying the rules for participation by 
individual states should be developed and administered primarily 
by collective representation of the states, (p. 3.12) 



Page 6.3 



US 



2. The organization responsible for developing and 
administering the linking effort should consider the following 
points relevant to implementation: 

ft. Procedures fcr documenting contents of existing state 
tests should be specified so that questions of what is being 
equated to what can be addressed. 

b. Specification of content represented in common anchor 
set s., ,uld be at the lowest level possible (subskill level) even 
if achievement indicators, at least initially, are to be reported 
at higher levels (skill or content aroa). 

c. The minimum criteria for considering an item for 
inclusion in the common anchor item set should include 

o The item measures a skill selected for the common 
anchor item set, and 

o sufficient empirical evidence is available about the 
item to ascertain its behavior for the major segments 
of the student population with which it will be used. 

d. The selection of zems should be made by teams of 
curriculum and testing specialists from a broad-based pool of 
items without identification of their source. 

e. The following set of testing conditions should be 
specified: 

o Target grades ana range of testing dates along with 
requirements for special studies in those states who 
normally test outside the chosen range or do not test 
at present but elected to participate. 

o Procedures for concurrent administration of the common 
anchor item set with existing state tests for the 
various alternative types of state tests (matrix 
sampled, state-developed single form, commercially 
developed standardized test) . 

o Auxiliary information for checking subgroup bias and 
determining sample representativeness (for equating and 
scaling purposes ) . 

o Minimum sample ;izes (for both schools and students), 
(pp. 3. 13-3. 14) 



Pilot Study 

Before proceeding with full-fledged implementation of any 
approach to achievement comparisons based on test data from 
existing state programs, project participants expressed the 



Page 6.4 



119 



belief that the impact of deviation from targeted testing 
conditions should be studied further. The desire for empirical 
evidence about the consequences of the proposed alternative led 
to project activities designed to identify content areas and 
grade for an exploratory study of the proposed linking strategy. 

ISSUE: 

What additional information is desireable in order to 
determine whether it is practically feasible to link existing 
state tests? 

RECOMMENDATIONS : 

1. A pilot study of the proposed common test linking 
strategy should be conducted in a limited set of skill areas for 
a specific grade range in order to determine both the quality of 
the equating under preferred conditions and the effects of 
various deviations from these conditions. (p. 3.3) 

2. The content areas and grade levels to be used in the 
pilot study should be literal comprehension for reading and either 
numbers and numeration or measurement for mattsmatics at grades 
7-9. (p. 4.27) 

Auxiliary Information and Documentation 

Part of the project effort was devoted to determining what 
auxiliary information states collect and/or report about the 
characteristics of their students and schools and whether it 
might be possible to develop within-state trend and subgroup 
contrast indicators from existing state data to serve as an 
additional source of between-state performance comparisons. Our 
investigations indicated that while there is a wide variety of 
auxiliary information collected across the states, there is too 
little conformity in practices at present to make such 
comparisons viable. Nevertheless, the types of auxiliary 
information collected in at least some states represent valuable 
sources of data that, if broadly and uniformly collected, could 
provide useful contextual information for state comparisons. To 
encourage and facilitate the collection and reporting of common 
auxiliary information by the states, several additional 
recommendations were made. 

ISSUE: 

What steps should be taken to encourage and facilitate the 
collection and reporting of common auxiliary information about 
characteristics of students and schools? 



Page 6.5 



120 



RECOMMENDATIONS : 



1. The organization responsible for coordinating the test 
linking activities described earlier should also develop plans 
for obtaining routinely a select set of common auxiliary 
information from states about their students and schools. 

2. Cooperating states should be encouraged to provide on an 
annual basis uniform documentation describing their data 
collection activities. 

3. Cooperating states should work toward the collection 

of a common set of auxiliary information about student and school 
characteristics along with their testing daf*. A standard set of 
definitions for measuring the chosen characteristics should be 
determined ; 

4. The organization responsible for coordinating test 
linking efforts should consider ways of contextualizing state 
test comparison data to mitigate against the possibility of 
unwarranted interpretations. The auxiliary information gathered 
as part of the previous recommendation should contribute to this 
activity, (pp. 5.17-5.18) 

Political, Institutional, and Economic Environment 

Most of our remaining recommendations regarding the 
implementation of the common test linking strategy had to do with 
the establishment of an effective political , institutional, and 
economic environment for the proposed indicator effort. 

ISSUE: 

What u/pe of environment must be established if the proposed 
indicator effort is to be successful? 

RECOMMENDATIONS: 

1. To develop the necessary levels of political support for 
this activity, broad-based support for the idea should be 
developed. Key participants include Chief State School Officers, 
their staffs, and other state education officials; other prominent 
state officials, including the Governor, Members of Congress, and 
state legislators; and representation of members of large city 
school districts, the education associations and from the private 
sector. 

2. An institutional structure for the conduct of this 
activity that relies heavily on the collective efforts of the 
states should be adopted. The Council of Chief State School 
Officers' new Assessment and Evaluation Coordinating Center 
proposal deserves consideration for this purpose. 



9 

ERLC 



Page 6.6 



3. Technical assistance and oversight should be established to 
assure the technical and methodological quality of the linking 

and equating, of the content of measures, and of validity of 
interpretations. This oversight should be provided by independent 
or "semi -independent panels, perhaps modeled on the panels 
advising the NAEP activity. 

4. A long-term, secure basis of financial support for 
coordinating and updating the test linking activity and 

the collection and reporting of common auxiliary information 
should be developed. This support is necessary to ensure that 
modifications in the basis of comparison and in the participating 
states can be accommodated over time while maintaining the 
integrity of the linking e^jort. (p. 3. 14) 



Cost Implications : An Addendum 

During the ST*I Panel meetings and in subsequent discussions 
with federal and state personnel interested in education quality 
indicators, questions about costs of linking state data for 
achievement comparisons were raised. Although a cost analysis was 
not explicitly called for contractually, the possible cost 
implications of our proposed alternative is considered in a 
separate addendum to the report prepared by Darrell Bock 
(Appendix 20). This addendum lays out the basis for a small- 
scale feasibility study of the test linking option proposed 
and provides a cost estimate of approximately $80,000 (direct 
cost) assuming that approximately 3 schools from each of 5 states 
(with varying testing configurations) were to participate in the 
study. 

Note that this cost estimate is for a limited pilot of one 
grade level in a few skill areas and assumes that states would 
bear certain of the routine field costs themselves. At the 
current stage, Uiere is insufficient information to provide 
reasonable ball-park cost figures for a broader feasibility study 
at other grades with a wider range of skills or for full 
implementation of such a linking system. In our view there needs 
to be further discussion about possible directions of the 
state efforts in testing and on the desired level of effort 
toward comparable achievement indicators before such numbers can 
be reasonably generated. 



Page 6.7 



ERIC 122 



REFERENCES 



Baglin, R.F. (1981) Does "nationally" normed really mean 

nationally, Journal of Educational Measurement s 18(2), 97- 
108. 

Harnisch, D.L. (1983) Item response patterns: Applications for 
educational practice , Journal of Educational Measurement , 
20(2), 191-206. 

Keesling, J.W. (1985) Identification of treatment conditions 

using standard record-keeping systems, In L. Bur stein, H. 
E. Freeman & P. H. Rossi (Eds.), Collecting Evaluation Data: 
Problems and Solutions , Beverly Hills, CA.: Sage 
Publications, Inc., 207-219. 

Neigher W.D. & Fishman D.B. (1985) From Science to technology: 
Reducing problems in mental health evaluation by paradigm 
shift, In L. Burstein, H.E. Freeman, & P. H. Rossi (Eds.), 
Collecting Evaluation Data: Problems and Solutions, Beverly 
Hills, CA: Sage Publications, Inc., 263-298. 

U.S. Department of Education (1984) State Education Statistics: 
State Performances , Resource Inputs , and Population 
Characteristics 1972 and 1982 . 



123 



APPETOIX 1 



121 



ERIC 



APPENDIX 1 



PANELISTS FOR FEASIBILITY STUDY OF STATE TESTS AS QUALITY INDICATORS 

R. OarreH Bock, Professor, Department of Behavioral Science and Education. 
University of Chicago 

Oale Carlson, Director, California Assessment Program, State Department of 
Education 

J. Ward Keesllng, Advanced Technologies, Inc. 

C. Thomas Kerlns, Manager, Program Evaluation and Assessment Section, Illinois 
State Board of Education 

Rob * r *, L ; L1nn, Professor, Department of Educational Psychology, University 
of Illinois, Champaign 

Edward D. Roeber, Supervisor, Michigan Education Assessnr-'t Program 

Richard Shavelson, Professor, Graduate School of Educatlc.;, University of 
California, Los Angeles and Rand Corp. 

Loretta A. Shepard, Professor, School of Education, University of Colorado 

Marshall S. Smith, Director, Wisconson Center for Educational Research 



'EMC" 125 



APPENDIX 2 



126 



APPENDIX 2 



Telephone Interview Guide 
for 

Quality Indicators Studv 



I. Introduction 



1. Introduce yourself: Hello, I'm , from the Center for 

the Study of Evaluation at UCLA. ~ 

2. State Purpose of Call: He are contacting State Assessment 
Directors 1n regards to a study which we are conducting on behalf of the 
National Institute of Education (NIE) and the National Center for 
Educational StatKMcs (NCES). This study was prompted by a concern on the 
part of Chief State Officers about the development of appropriate 
Indicators of educational quality at the state level. One of the sources 
of Information which could possibly be used for this purpose 1s existing 
state assessment or competency data. The reason why we are contacting you, 
then, 1s to obtain some Information about your testing or assessment 
program. We hope that based upon the Information which we gather from all 
the state assessment directors that we will be able to provide 
recommendations about whether 1t Is methodologically feasible and 
economically reasonable to use existing state assessment Information as 
Indicators of educational quality. 

Before we begin, you should know that the study has the support and 
cooperation of the Chief State School Officers, as well as that of some of 
your colleagues such as Dale Carlson (California), Ed Roeber (Michigan), 
and Tom Kerrlns (Illinois). We appreciate your cooperation and will 
provide you with summaries of what we eventually produce. 

To facilitate these calls, we have organized our questions Into three 
major sections: Overall design of program, reports, and data availability. 
In the Initial section, overall design, we wish merely to confirm 
Information which we already have and to complete any omissions. In the 
latter sections, some of the questions may be answered through documents 
which you could send us. If so, please Indicate that and we will proceed 
more rapidly. 

II. Overall Testing Program 
Our records Indicate that: 

1. Does your state have a statewide testing or assessment program 
whose purpose 1s other than assessing the minimal competency level 
of students? Yes No 

2. Does your state have a statewide minimum competency testing? 
Yes _____ No _____ 

If the answers to both of the above were NO, then go to Question 6 - s . the 
end of the last section. 



127 



2 



3. For each of the above, what areas are tested: 

Assessment: Reading Math Writing Other 
competency : Reading Math Writing Other" 



4. At what grade levels are these tested: 

Assessment : Reading Math Writing Other 
Competency : Reading Math Writing Other" 



5. Are each of these levels tested annually, and If so what month(s)? 
Yes No 

If No, on what basis are they tested? 



6. Now we would like to understand your student sampling strategy: 

Do you test all students at a grade? Yes No 

If No, please describe your sampling: 



7. For what purposes are these tests used: 



8. Are the test Items developed internally or externally 

If externally, who developed them 

Name of test 



9. Are you aware of other states that use the same or some subset of 
the same Items? 

Yes (Specify which: ) No 

10. Are you planning any major changes In the program for next year? 
Yes No 



BEST COPY AVAILABLE 



o 128 

ERLC 



3 



III. Reports 

Mow, we would like to switch our focus to the reports which your program 
regularly prepares and which are generally available. 

1. Do you produce the following types of reports for your program: 

Technical Reports, describing Psychometric Properties of the 
tests- 

Content Reports, providing Content Specifications, 

Analysis Reports, providing summaries of the results. 

2. Can we obtain copies of these reports. Yes No 

3. What Is the most recent school year for which these reports are 
a va 1 1 abl e? Year . 

4. In your Content Reports, do you provide the following: 
Objective Statements 

_ Domain Specifications 

Sample Items 

Description of Test Construction Procedures 

Description of Item Sampling 

5. In the Techlncal Reports, do you provide Information about the 
following: 

Sub-Group Differences (Specify types of Information reported) 

Item Characteristics (Specify types of information reported) 

Reliability (Specify types reported) 

Content Validity (Specify types of Information reported) 

Construct Validity (Specify types of Information reported) 

Predictive Validity ( Spepl/f^ <^t Information reported) 



ERLC 



BEST COPY AVAILABLE 



4 



6. We are particularly Interested In all your reports which contain 
results from the tests. The following questions all concern these reports. 

a. Could your briefly ennumerate the reports that contain results that 
you regularly produce (other than reports back to the schools and 
districts, though we would like to receive sample copies of these): 



b. In these reports, are the results provided for a single year? 

Or, do you provide longitudinal or time trend data? . 

If the latter, for what periods? 



c. What unit of analysis do you use In these reports: school, 
district, state? - 

d. Are the results reported In the aggregate for the whole state? 

Or, do you report results for subgroups, e.g., by sex, race, socio-economic 
language, community type. 

e. If you report results for subgroups, what characteristics do you 
use to define those groups? 



f. When you report the results, what type of scale do you use? 

percentiles 
"""""" number correct 

scale score 

percent correct 

other (Specify): „ . — 

g. When you report the results, generally what form of statistical 
summary Is provided: 



Measures of Central Tendency (Specify which) 
Measures of Dispersion (Specify which) 



Frequency Distributions (In what form:) 

Proficiency Levels (percentages passing or reaching criteria) 



Other, Please describe: 



c 0 BEST COPY AVAILABLE 130 

ML 



5 



h. Are these statistics provided for all subgrouos? 



1. What statistics or method of presentation do you use for 
longitudinal data? 



7. Are there other reports which you produce that contain results or 
information about the educational quality 1n your state? 



ERLC * 



6 



IV. Data Availability 

One of the avenues we are examining Is whether 1t might be feasible to 
actually use and reanalyze state assessment data in order to derive 
Indicators of quality. Therefore, we would like to know about the data 
which you collect from the tests. 

Would the data you have collected from your test be available for 

analysis by us? Yes No (go to 6) Maybe (Specify the 

conditions: ). 

If yes, what are the procedures for obtaining the data? 



2. 



3. 
4. 



5. Besides test scores, what additional Information Is stored at this 
level (I.e. race, sex, etc.)? 



How long will It take? 
How much will It cost?" 



Is the data available on computer tapes? Yes No 

Is the data stored at the student level? Yes No 

Is data available at the Item level? subtest? total test? 



6. Other than testing programs at your state, Is other Information 
collected by the state which might be used for this study? 
(Indicators of quality or Indicators of context) 
Yes No (If No, go to end.) 

0 What agencies house this information: 



• Could you please Identify appropriate contact people at these 
agencies: 



* What type of information is available? 



BEST COPY AVAILABLE 

o 132 



ERIC 



c Is 1t available In reports? (If so, please Indicate titles) 



• Is 1t available 1n compuLr compatible format? Yes No 

END: Thank you for you help with this project. As I mentioned at the 
start, we will provide you with a summary of results at the d of the 
project. 



ERIC 



133 



USE PURPOSE OF STUDY 



8 

Ad den dun 



JfatioISwSSlSS oTEXni t ?n t ?or r a eq J eSt ,K f ? 1 r , pr0p0sa1 by the 

tjsts as Indicators of education* l -J.ff? 15111 ^ Stud * of the wt of state 
"111 address whether exist?™ !!!! qu . a11t * on « national level. The stuJJ 
of educational JSEmES!? S £ te , te ? ts ma y combined to g ve a D l22l 
provide a better daSilS ft j2rt£ t 2 ,t \f ,a1 of the r «earch w1i? be to 

expertise to determine the 1*^ttf?^tf%*™*' *"< policy 



APPROACH 



with s^faf e^ 6 t1^V^"\ C /el: 01 Cy J dv1s0rs « "» « consultants 
alternatives jointly ligStS by ?2£ t | a Hf ,, ! aM "^ stua > of tSTJSSi 
of the study 1s to Idtnwl ttt riISr?/.i d J Cat0rs - Thus » the t!2 
respective technfcal r^^JS^*^""! approaches and ?he1r 
USTV* determine the SrTtnTSKi J" add1 4 t1on » CSE will conduct 
SSh^-k "? 1n 9 the resulS Sf ;Jis"; r i e ; a*; 1 ^ 1 " 9 a ? se » m ent data In 
materials obtained from the states csf in?* an examin ««on of the 
recommendations regarding the rSl'tiZl ♦ w1 J 1 P re Pare a set of 
the different alternative^ . J^S^k" 0 ^ 6 feas1b "^ of 

fJmV" d f e1r commendations and "SlestW Sm re 5 e1ved by ^Advisory 
formal project report. suggestions will form the basis for the 



SCHEDULE 



**1 report W1„ ^SillE J^S^i^ The 



134 



APPENDIX 3 



0 

ERIC 



13 



5 



APPENOIX 3 



Revised 5/15/85 

"State Tests as Quality Indicators 11 Project 
Center for the Study of Evaluation 
Partial Summary of First Policy and Technical Panel Meeting 

Washington, D.C. 
November 29-30, 1984 



The first Policy and Technical Panel meeting for the State Tests as 
Quality Indicators (STQI) Project, being conducted by the Center for the 
Study of Evaluation (CSE), was held at the National Institute of Education 
on November 29*30, 1984. While attendance at the meeting fluctuated, the 
participants Included representatives from the following organizations and 
agencies: National Center for Educational Statistics, National Institute 
of Education, CSE project staff, STQI Project Policy and Technical Panel 
members, National Assessment of Educational Progress, Office of Planning, 
Budgeting, and Evaluation of the Department of Education, and- the National 
Association of School Boards of Education. 

CSE Statement of Objectives of the Project and the Panel Meeting 

The 11/29 meeting began with a discission of the overall objectives of 
the STQI Project and of the first panel meeting. The overall project 
objective 1s to explore the feasibility of using equated or aggregated 
state testing results as national or state-by-state Indicators of 
educational quality. This exploration 1s to entail a documentation of 
existing state testing program activities with specific emphasis on the 
possibility of usUj data already routinely collected to form "comparable" 
state-level Indicators end, 1f so, to determine the types of analytical and 
psychometric methods necessary or potentially appropriate to generate the 
dt sired Indicators. With respect to the latter, the original CSE proposal 
had Identified essentially four general approaches to derive Indicators 
using state data: equate of test content; econometric adjustment for 
selection and/or economic am* socioeconomic conditions; equating by the use 
of a common test or liming .neasure; and methods that depend only on 
within-state information such as trend data and subgroup comparisons. 

The purpose of the first panel meeting was to consider which of the 
available approaches for deriving Indicators from state data were 
potentially useful given current testing practices, and thus which 
approaches CSE should explore in greater dept. using reports provided by 
the states. As part of the preparation for V . meeting. CSE conducted 
in-depth telephone interviews with representatives from state testing 
programs and requested copies of existing reports and content 
specifications generated by the state testing programs. The results of 
these phone Interviews were then combined with Information from other 
recent surveys of state testing activities and distributed to meeting 



136 



Page 2 



participants. It was expected that this Information would place the 
proposed approaches within a context of existing practices and aid 1n the 
effort to refine and focus the remaining tasks of the feasibility study. 

Federal Perspective on the Project 

A brief statement of the federal perspective on the Intent of the 
project was then provided by Emerson Elliott. In his remarks- Elliott 
placed the present project within the context of recent federal Initiatives 
on educational Indicators. These Initiatives are most directly reflected 
In Secretary Bell's 1984 release of the State Education Statistics Chart 
and the work of the Department of Education's Indicators Project. Their 
intent, along with the support for the STQI project, 1s to provide national 
and state-by-sute data that help to answer three questions. Namely, 

1. What 1s the health of American Education? 

2. What are students learning? 

3. Are things getting better or worse? 

Director Elliott Indicated that he did not believe that the attempt to 
address the above questions using state-level data as quality Indicators 
necessarily meant that the states must be ranked. W1 th1n-reg1on 
comparisons and longitudinal patterns within states were cited as examples 
of other types of Information that would serve to Inform policy makers with 
respect to the major questions of Interest. What 1s of primary Interest 1s 
the compilation of a national picture of what's happening 1n the states 
with respect to the quality of their educational programs. 

Elliott's specific expectations for the STQI project had shifted 
somewhat from his original objectives. Early on, he had thought that this 
project might yield some Indicator data that could conceivably be Included 
1n the next (1985) release of Secretary Bell's chart. However, given the 
accelerated t1me~11ne of the new chart (to be released 1n December 1984), 
this goal no longer was reasonable. Moreover, given a new awareness about 
the diversity of the existing state testing programs and the broad-based 
changes 1n these programs that have recently occurred or are currently 1n 
progress, 1t does not appear likely that existing state testing activities 
can readily serve as a means of generating comparable and stable Indicators 
of educational achievement across the states 1n the near term. And, given 
recent actions by the MAEP Policy Panel and council of Chief State School 
Officers (CSSO), 1t may be possible to generate state-level NAEP 
performance Indicators 1n about five years. If this were to occur, there 
might be less long-term Interest 1n using state testing data as Indicators. 

Given the changing situation, Elliott ultimately would like the STQI 
project to provide further Insights Into whether the assessments states 
administer and report for their own use can be synthesized to form 
Indicators of national trends 1n educational quality. In addition, he 
hoped that the project could contribute material for a section on national 
achievement to appear 1n the revised Indicators Reports to be published 
periodically by the National Center for Education Statistics (NCES). 

At this point, participants cited other activities on education 
Indicators that were related to either the federal Initiatives or this 
project's efforts. Other agency and organization work mentioned Included 
the National Academy of Science Project on Mathematics and Science 
Indicators runded by the National Science Foundation, relevant sections 
from the General Accounting Office's examination of the National Science 
Board's Report on the Status of Science Education, (CCSSO's recent vote 1n 

ERIC 137 



Page 3 



ERIC 



support of developing state-level education achievement Indicators that 
might be used for state comparisons and their efforts to build state and 
national capacity for collecting data on other areas relevant to 
education achievement, and possible activities by Congress and the Office 
of Technology Assessment. There was a general sense of movement across a 
broad front to develop a national capacity to collect and report 
Information that may serve as Indicators of the quality of the American 
education system. 

Descrip tion and Discussion of Available State-Lev el Information 

The availab l e results of C SE contacts with representatives from state 

testing programs and examinations of reports from other sources regarding 
these programs were described. Copies of the Telephone Interview Guide for 
the first round of calls to state testing programs (Attachment I), a draft 
version of a chart containing state-by-state responses to key sections of 
the interviews (Attachment II; note that this chart has been updated since 
the meeting to reflect additional state contacts) and a brief summary of 
selected facts about state testing programs (Attachment III) were 
distributed and discussed. The general consensus of participants was best 
reflected 1n the comments from the state testing program members of the 
panel. They agreed that the handouts clearly reflected accurate 
Information about variations 1n existing state programs, but that the 
actu* 1 •'rture was even more complex than depicted. Our Interview data 
app. represented testing programs fairly, particularly the kinds of 

tesu a,.« the testing targets (minimal competence, basic skills, broadly 
measured achievements, exceptional educational performance). Less well 
detailed was the function these tests were designed to serve and how they 
are currently used. All programs are subject to change but that change has 
accelerated, largely as a result of state reform Initiatives In response to 
the National Commission on Excellence 1n Education report. 

The dlscussslon at this point also touched on a number of other issues 
and Ideas briefly, Including the possibility of subgroups and/or content 
disaggregation of state test results, the variation 1n the timing of 
testing programs, the desirability of a quality indicator for state-level 
longitudinal and subgroups trend data patterned after the Consumer Reports 
automobile Indexes or the Consumer Price Index, questions regarding the 
commonality of content across states, the potential for use of shared Item 
banks, and better coordination and cooperation with commercial test 
publishers. 

Equating of Test Content .... 

The discussion then shifted to direct consideration of the different 
methodological approaches for aggregating, equating, or otherwise combining 
measures as Identified 1n the CSE proposal. The first approach considered 
was the equating of test content . This approach focuses on the content of 
state tests — content specifications, Items, subtests — and considers 
whether 1t 1s possible to classify Items on some basis (e.g., commonality 
of domain, difficulty) Into "equivalent clusters" and then compare across 
states based on performance on equivalent Items. The general trend of the 
discussion regarding this approach was that while U might be theoretically 
possible to equate on content, 1n practice a considerable number of 
complications exist making the notion Impractical at present. Among the 
points made by participants were the following: 

138 



Page 4 



1. Not all states operating Internally developed programs are 
equally conscientious about developing content specifications for 
the generation of test Items. 

2. Even among the states that do provide detailed content 
specifications, the match of test Items to specifications and the 
distribution of Items to objectives may be uneven. 

3. To do a proper analysis of the content of state assessments as 
a first step 1n the equating, one cculd not simply rely on the 
content specifications. It would be necessary to examine actual 
Hems and tests and perhaps talk to the people who put the test 
together. The actual process of generating Items tends to be an 
Iterative Interplay among the specification, the examination of 
the wording of each Item, and the Item statistics. 

4. The level of abstraction that can be used to equate content 1s of 
concern. It may be that content equating 1s only feasible at the 
most general level (e.g., reading, math). 

5. If one attempted to equate at too high a level of content 
specificity, the number and nature of Items that qualify as 
common topics across states can artificially truncate differences 
1n achievement. 

6. It may also be Important to re«nember that, 1n practice, 1n order 
to be able to combine Items to form a score for comparison, one 
needs similar Hems given 1n essentially the same format (e.g., 
not vertical vs. horizontal) at roughly the same administrative 
time to the same grade under the same set of external sanctions 
with respect to performance (I.e. consequences of the 
performances). It may simply be Impossible to satisfy all these 
conditions with existing state testing programs. 

7. If their current Interest 1n state-level NAEP data continues or 
expands, then c;he question of the match of the content emphasis 
of the state testing programs with that of NAEP 1s worthy of 
further consideration. (The same can be said for comparison with 
commercial tests 1n states where a specific publisher has a 
substantial portion of the local testing market.) 

Such efforts might provide a basis for the development of a 
national Indicator with respect to the diversity of content of 
testing programs across the states. 

With respect to the possible further work of the STQI Project on 
approaches emphasizing equating of test content, the discussion was 
summarized as follows: 



1. There were substantial doubts about the utility of equating of 
test content. 

2. There was some support for providing a more In-depth description 
q of the content of the state Q test1ng programs. 

ERIC ljy 



Page 5 



3. The possibility of developing an Indicator of the diversity of the 
content of state tests warranted further examination. 

4. If, evon given the above, one still wanted to equate content, 1t 
would be necessary to work at some level of general objectives 
(perhaps more specif Ica.ly than reading comprehension). 

Participants raised several additional points that emphasized a need 
to go beyond an examination of test content to generate Indicators of the 
school curriculum. There was Interest 1n more direct quality Indicators of 
curriculum activities at the state level. This Interest suggests the need 
for thought about how to characterize the core objectives students are 
supposed to know and how to go about ascertaining this Information. 
Several panelists cautioned about Inferring what a state teaches based on 
what 1t tests. There was no Indication that participants expected the STQI 
project directly to address these concerns; however, 1t was clearly 
perceived that any attempt to use content comparability as an Indicator 
must be balanced against the potential for limiting the representativeness 
and validity of any such Indicator as a measure of state-based activities 
and goals. 

Econometric Approaches 

The term econometric approaches" was used to characterize procedures 

which Involved attempts at analytical adjustment of state testing data to 
bring about a greater cegree of comparability across states with respect to 
economic and socioeconomic factors as well as to the nature of the students 
within the state who take a given test. These approaches fall Into two 
broad categories. In the simpler category, state test data are directly 
adjusted or weighted for a set of economic factors (e.g., state ( 
unemployment rate and other Indicators of the health of the state s 
economy) or socioeconomic factors (e.g., Indices of poverty, ethnic 
make-up, bH1ngual1sm) to arrive at a sst of measures that presumably 
Compensate for these sources of non-schooling Influences on educational 
achievement prior to any effort at cross-state comparison. The overall 
Intent of such a strategy would be explicitly to take context factors Into 
account 1n reporting state education outcome Indicators. 

The second category of econometric approaches would entail employing 
modern methods for ad^stlng for sample selection bias. Presently any 
attempts at uslr." ..f, ACT, ASVAB or other non-census testing that occurs 
1n multiple s* -«*s as Indicators 1s limited by the non-random and 
non-comparab .e sample of students within a state who take these tests. If 
1t were possible to obtain student-level data on these tests and on the 
"pertinent" characteristics of the students who take the tests, 1n theory, 
1t may be oosslble a) to apply selectivity modeling methods to adjust test 
performance for non-random selection at the student level (within and 
across states) and b) then to use the state-level aggregated adjusted 
scores as a basis for equating or Unking the state testing program data. 
This strategy entails several strong assumptions about available data and, 
even under the best circumstances, may yield results with only limited 

precision. , ... . . u 

Overall, participants were skeptical about the practicality of these 
types of adjustment strategies at the present time. With respect to the 
first category, there were questions about whether most states collected 
the right data 1n comparable ways 1n a sufficiently accurate manner. 

ERIC 140 



Page 6 



rSmLf the necessity of having to find sow means of representing state 
JeTt daS o a loiSn scale. That problem would still havj .to bj ; solved. 
And with respect to the possibility of using selectivity modeling to 
JJrivTat I c£mon scale, the participants did not feel confident at this 
pomabSut suT! strategy If only because too Uttle 1s known about how 
these methods work 1n practice. 

r "i,M^^ of using equating or linking of 

tests by providing a historical perspective on other efforts pertinent to 
this task. He described the Anchor Test Study which attempted to equate 
comercla ly published standardized achievement tests for the purposes of 
Title I evaluation. It was pointed out that this study required 
substantial resources and time and Its value quickly deteriorated as 
Dubllshers modified their test content and renormed their tests. Linn also 
discussed the problems with the TIERS (Title I Evaluation and Reporting 
SvlSm) daS that still remained even after the Anchor Test Study and 
subsequent development of MCE scales and TIERS evaluation models. In 
addition to remaining equating errors, the strong effects of time of year 
for testing and test administration conditions were cited. 

Finally. L1nn briefly discussed the question of NAEP as a common scale 
for state comparisons. While this 1s an obvious possibility that will 
attract further consideration, he reminded the participants that NAEP tests 
contain only small numbers of Items on any objective and may not represent 
all content of interest for Inclusion 1n state outcome Indicators. 

Darrell Bock then presented the basic psychometric alternatives for 
equating and Unking state tests. Two strategies were described. The 
first strategy Involves the use of common anchor Items. This strategy 
requires that a set of anchor Items (taken from rwtp, or from a pool of 
Items provided by different states) be Included on all tests to be equated 
and that Item response theory (IRT) methods be used to scale these Items 
within each state's tests. This strategy assumes the absence of any type 
of context and location effects for Item placement within a test. Jf 
effects of time of testing within a school year, and of test administration 
conditions. Many participants were skeptical about whether such 
assumptions were practically justifiable. 

The second strategy requires using matched data from students who take 
both the state-administrated test and the test chosen to serve "the 
anchor. This strategy could potentially be employed 1n states which have 
every pupil take the state test since students who took, for example, the 
NAEP that year would be doubly tested. To employ this strategy, It would 
have to be possible to match students person by person (I.e., students 
NAEP scores with their assessment scores). To make this practical l*r 
statecomparlson, NAEP would have to test more densely 1n most states IBock 
estimated that 1t would require approximately 1000 matched kids at a given 
grade level to have any confidence m the IRT equating and scaling). There 
would also have to be enough Information to adjust for time of 
administration of tests. 

o 141 

ERIC 



Page 7 



According to Bock, the possible advantages of this matched test data 
strategy 1s that random, representative samples of a state's student 
population are not required and a smaller sample of NAEP testing than would 
be needed to use NAEP Itself as a state-level achievement Indicator might 
suffice to use the NAEP as a benchmark. Also, the same strategy could be 
used 1n states where a commercially published test 1s given to a 
sufficiently diverse set of schools and students. Finally, such tests as 
the SAT, ACT, or ASVAB could be used to check the calibration. 

The general discussion with respect to the matched test data strategy 
first focused on Its costs and accuracy relative to alternatives such as 
state-level NAEP testing. Archie LaPolnte described the current expanded 
state testing using NAEP (200 students each taking the same booklet, being 
conducted 1n Florida, Georgia, and Tennessee) and what size of study in a 
state would be required to fully Implement a smaller NAEP (a so-called 
M1n1-BIB with 7 booklets requiring 5000 students per grade level; roughly 
100 schools 1n each state per grade level to yield at least 1000 kids 
taking 25 test Items). There was some concern for BIB splrallng effects 
that the new NAEP procedure would Introduce. Under Bock's proposed scheme, 
there presumably would be lower costs and fewer analytical complications 
than for the M1n1-BIB design envisioned by NAEP. 

The discussion then turned to the possible complications 1n employing 
a matched test data strategy and what kinds of Information would be needed 
to decide whether to Implement fully the approach. It was pointed out that 
the approach assumes one can obtain accurate estimates of Individual 
abilities. Also, 1t would be necessary to calibrate the test Items 
repeatedly because of possible Item parameter drift. The concerns about 
test administration conditions and time of testing would still exist. One 
panelist cited the Impossible tangles such a strategy poses, especially 
since 1t was to be done retrospectively. 

The question was then raiseu about whether the utility of the matched 
test data strategy would depend on whether one wanted to compare a state's 
local objectives and performance nationally or to compare states on 
national objectives and performance standards. Two possible state-level 
advantages for Unking to NAEP were Identified, namely, 1) the local and 
state pressure to compare states to national norms and to other states, and 
2) maintaining a certain degree of state control over tests. In the final 
analysis, 1t was agreed that the problem required states to grapple with 
the Issues of the face validity for various stakeholders (state testing 
directors, CCSS0, legislators, Governors, public) of three alternatives: 
no common scale, equated scale, common test. 

In order to decide which alternative 1s best, we need more 
Information on the following Issues: 

i. W111 a NAEP state-by-state mini-assessment yield more than just a 
total reading and math score? Would we also be able to provide 
urban/rural, regional (within state) and SES comparisons? 

2 Would 1t be possible to pilot the matched test data strategy using 
existing data? Seven states (California, Florida, Illinois, 
Massachusetts, Michigan, New York, and Texas) currently have 
approximately 1000 students taking NAEP. What are the time and 
cost estimates of piloting this strategy 1n a cluster of states 
without additional data collection? 

m. 142 



Page 8 



3, What additional new data collection would be necessary to have 
sufficient data to Implement the matched test data strategy 1n a 
substantial number of states? What are the time and cost 
estimates for the expanded, full Implementation version of this 
approach? 

There was some discussion about whether there were other linking 
vehicles besides MAEP, 1n particular the possibility of doing such equating 
with commercially published tests. Several participants cited the possibly 
shifting attitudes of commercial publishers toward greater cooperation with 
NAEP as evidence of potential connections, and the substantial testing 
already being carried out 1n some states (e.g., CTBS and CAT used as state 
tests and by a large number of districts 1n some states which don't require 
1t) which makes the use of commercial tests as a common link at least 
technically feasible. 

In general, there was a consensus that the STQI project should devote 
further effort to Identifying and describing the conditions states would 
have to meet to develop a common scale by using an anchoring approach of 
either type described above. This examination would presumably focus on 
technical consideration (t1mg1n, dimensionality characteristics of the 
test, sample size needed) and resource and time considerations. 

Within State Trends 

The last approach discussed Involved attempts to rely strictly on 

within-state data to yield cross-state comparisons. Operationally, this 
approach might entail developing Indicators of longitudinal trends 1n 
performance within the state or subgroup (e.g., rural/urban, SES, ethnic or 
other student and school contextual characteristics) comparisons (either 
cross-sectional ly or over time). If there were a sufficient number of 
states collecting: a) comparable data over time and b) comparable 
Information that would allow disaggregation of test performance to the 
level of Identifiable subgroups of students and schools, performance 
Indicators based essentially on effect-size estimates (e.g., the year-to 
-year gains or urban-rural differences expressed 1n standard deviation 
units) could potentially be developed. 

There are several potential problems with the w1th1n-state trends 
approach. The within-state comparisons would provide Indicators of trends 
but not levels (relative versus absolute performance). Any changes 1n 
tests over time would potentially affect the validity of the longitudinal 
comparisons. Also, while states might nominally collect the same 
Information relevant to classlflcaton by Important subgroups, 
operationally, the specific measure of a given characteristic (e.g., 
definition of an urban versus a rural school, measurement of SES, ethnic 
and language classifications) used by states may differ sufficiently to 
hinder seriously attempts at cross-state comparison on this basis. The 
analytical model that would underrlde such Indicators (I.e., choice of 
standard deviation to serve as the base for the effect-size estimate, and 
the model of normal growth underlying longitudinal trend measures) would 
also require further thought. 

The consensus recommendation of the meeting participants with respect 
to the within-state trend approach to educational Indicators 1s that this 
approach warranted further examination 1n the hopes that 1t may be feasible 
to derive a Consumer Report- type up-down trend Indicator to Include along 
with other achievement Indicators that more directly reflect absolute 
levels of performance. 

143 



Page 9 



Closing ^«^^r S ,| ,nrt „!rr r L!!,L. nn' I l| t ?i > ' T""^ in « 

W tne final p^fi^TSte testing Is going over the next five 

obtaining Information about where state testing s s ^ 

to eight years. He a so hopec I that the ^J^Jg indicators based on 

guidance about the value •^J**™"* otrSmnct). the diversity of 

i r™,ni.t. the Interviewing about state testing activities and 
l - Sf»e!op a chartist characterizes these activities. 

2 . continue to ^'^fT^^Tc^t^ 

sss £ o tSr«"o d co c ?;ri^ :is.* «.*-. « 

data at the state level. 

Conduct an examination of the content of state / est f t 1ncl "2l r n9 p 
analysis of both content specifications and actual items where 

feasible. 

Explore further the feasibility of developing summary Consumer 
ReSort-type Indicators of trends with respect to diversity of 
cogent TasuJeircomplexlty of skills measured, longitudinal 
changes, and subgroup differences. 

Attempt to provide resource and time J*™"Z*lS th 
"ioTand fully Implement the approaches judged to be fruitful 
to arrive at state-level education Indicators. 



3. 



4. 



144 



APPENDIX 4 

itecision Memorandum on the Feasibility of Using state Level Data 
for National Educational Quality Indicators 

Eva L. Baker and Leigh Burstein, Center for the Study of 
Evaluation, UCLA 



Background 

The desire for a national picture of educational quality 
remains a continuing but unresolved goal. Past efforts using 
available data from college admission tests have provided one 
source of information, but have been criticized because they 
represent performance of only one segment of the student 
population. Results from administrations of achievement measures 
of the National Assessment of Educational Progress (NAEP) provide 
a partial picture, but are limited because of the general 
character of the measures and the schedule upon which they are 
administered. Furthermore, because of NAEP sampling practices , 
no state by state comparative data are possible. 

In the past, there has been some resistance from states 
about comparative information of any sort. The arguments have 
centered on the need for good contextualization of information so 
that differences in performance can be properly attributable to 
quality of educational services and not to social and economic 
conditions in the regions themselves • 

A national test has been proposed periodically as a 
solution, but has been rejected because of the constitutional 
delegation of educational responsibilities to the States and the 
attendant notion that such a test would exert untoward Federal 
pressures toward uniformity in educational practices. The cost 
of such a new test (or radical expansion of the NAEP sampling and 
scheduling) would also be high. 

Last fall, a question was raised among high level policymakers 
regarding the feasibility of using existing mechanisms within the 
States to contribute to the picture of American educational 
quality. Specifically under consideration was the extent to 
which existing measures of student performance collected by the 
States could be combined to 1) provide a national profile of 
performance in achievement domains; 2) provide a basis for state- 
by-state comparisons of student performance. A feasibility study 
was contracted to the UCLA Center for the Study of Evaluation 
(CSE) to explore the methodological and implementation issues of 
such an approach. This memorandum represents a summary of these 
analyses and recommendations regarding the feasibility of this 
approach. 

Feasibility Study 

A panel of scholars and practitioners was convened to engage 
in discussion of these issues. A list of participants is 
appended. These meetings were held in Washington, D.C., and were 



1 



ERLC 



145 



open to interested observers from government and professional 
organizations. Following the first meeting, CSE staff and the 
panel members developed options, collected information, and 
distributed preliminary findings. At a second meeting this 
spring, a general consensus was reached. 

Methodological issues. 

The group considered a range of methodological options for 
combining State-level data for national comparative purposes. 
Opinions converged on using a common test linking and equating 
approach based on the administration of relevant common measures 
along with each state's own test to a sample of students. 

Two concerns needed to be addressed before a decision could 
be reached about how this linking strategy might be applied. 
First, the question of possible content of the common tests was 
raised. To that end, CSE staff prepared a content analysis of 
tests or specifications of tests from 38 responding states who 
were conducting testing programs as of Spring 1984. The results 
of this analysis are included in our larger report. Based on our 
findings, the panelists recommended that two or three skill areas 
at a single grade level be chosen for initial examinations of 
equating options based upon the frequency of the skill areas' 
inclusion in State measures and the frequency at which various 
grade levels were represented in State test administrations. The 
areas of literal comprehension in the reading achievement area 
and either numbers and numeration or measurement in the 
mathematics achievement area at grades 7 through 9 were 
considered most suitable for initial equating efforts. 

The second concern was the nature of the common measure 
proposed to serve as the basis for equating the disparate state 
measures, it was determined that technical procedures now exist 
that make it possible to equate tests without requiring that all 
sampled students respond to the same set of common items. 
However, the measures needed to share certain technical 
characteristics with the target measures in reading and math. 
Principal among these characteristics was unidimensionality of 
the scale. 

Options 

i 

t 
i 

Various options were considered for the common linking 
measure. These will be briefly described below with a statement 
of their benefits and limitations. 

Option One: Using NAEP measures for equating purposes 



2 

146 



Benefits : 



1. Measures exist 



2. Measures have been developed with appropriate 
technical expertise 



Limitations : 

1. NAEP it not administered on an annual basis. 

Most State measures are administered annually and 
the ffoal c : ,.he Qualitv Indicators effort is annual 
rep ~g. Therefore, NAEP schedules might be 
clw. »t & significant cost, or the equating 
wou. * .me intolerably imprecise if "old" NAEP 
mease \.s were used in between NAEP administration 
periods. 



2. The current density of NAEP sampling- does not 
provide a basis for equating in mos. states. NAEP 
sampling could be augmented, which would increase 
administration costs and would ent_*l certain 
difficulties in interpretation of longitudinal data. 

NA EP and state tests would have to be available from 
J5 e .™ e sam 2 le of students, at the same po:L.it in time. 
If JAEP schedules were adjusted to concurr with state 
testing schedules, then the NAEP data mignt not blend 
with the established NAEP testing schedules. If the 

SJ™ e J testin9 datea were alt «red to correspond to the 
NAEP dates, then data from the sample schools might not 
be equivalent to data obtained as pnrt of the regular 
state testing effort. 

Option Two: Creating a common pool of items drawn from existing 
State measures for use in equating 

Benefits: 

1. fc ^asures exist (either state developed or publisher 
provided) and have empirical data associate J with 
them. 

2. Because measures would be derived from tests 
already used by States, they would more adequately 
reflect at least s^me local goals . 

3. Cooperation and contrioution to the pool would 
encourage state capacity building and the 
exchange of technology from states with better 
developed testing fr *ograms to those in relatively early 
stages . 



ERIC 



147 



SlU and content areas for equating woula not 
be limited to current MAfiP content areas, but could be 
developed based upon the actual Interest and 
distribution «f tested topics. " torMC ana 

5. Costs for data collection would be low because the 
p11*c£Ie W ° Uld integrated with normal W S ?I?e U ?esSng 



Limitations: 



SH"* •Wroacli ia dependent upon State cooperation. 
S}JL? ,0p *;'? lo V B turn ******* upon the political 
oiJ?™ "5 i? Cal P r easures upon a chief state School 
Officer and the state testing program's operations? 

2. Pilot studies would need to be conducted ' ' the test 
pool used for equating on any skill or con ent are" 

3. An organizational structure would need to be created 
£ji?Z?Jf? e thi f t P ro «ess and to assure technical and 
political sensitivity of the approach. 

4 * sfM^ n ^ a ^?' iCCe f S f Ul trial Period, some regular 

sss Su'sr^issr" external to ****** 



Recomrnenda ti on : 



We recomraenc* that the state item pool strateov be tried «„ a „ 
exploratory basis for a two ye?r period? after which iudSmlni-S 
about continuation, modification, or scansion cSJ?d SftSS?" 

Implementation Issues Relevant to the recommendation 

. V 111 *" a serious matter to develop the necessary levels of 
political support for this activity. Key participants are of 
course, the Chief state School Officers, their staffs and ai-Er 

Chief Stat, school Officers' S.«^" t .n1'. ! ^?ulS 0 ? UnC " 



ER?C 



148 



Coordinating Center proposal deserves consideration for this 
purpose. 

Third, it is essential that technical assistance and oversight 
be established to assure the quality of technical and 
methodological operation of the equating, of the content of 
measures, and of validity of interpretations. This oversight 
should be provided by a panel, perhaps modeled on the panels 
advising the NAEP activity. 

Fourth, a Jang-term, secure basis of financial support for this 
activity should be assured. The costs will not be high but 
resources should be regularly available. 



Additional Technical Comments 

Our interviews with state testing officials and examinations 
of reports and tests currently provided by individual states 
indicate an extensive range of activities of varying 
sophistication and quality. :iany states collect and/or report a 
wide of array of auxiliary information about their students and 
schools along with their test data. Some states maintain and 
report longitudinal trenc.s, and a few provide within-state 
comparisons, cross-sectionally or over time, broken out by major 
student and school sub-groups (e.g. , student sex, school size, 
type of community). These auxilliary indicators also represent 
valuable sources of data that could provide useful contextual 
information in the interpretation of state comparisons. The 
group coordinating the State item Pool could be responsible for 
developing strategies for obtaining this ancilliary information 
on a roucine basis. 



To encourage and facilitate the range and quality of information 
to be provided by states for comparative purposes, we make the 
following additional recommendations. 

o Participating states should be encouraged to provide on 
an annual brsis uniform documentation describing their 
data collection activities (along the lines currently 
provided through the Education Commission of the States 
and the Roeber survey). 

o Uniform standards for documentation of the contents of 
State-administrated tes^.s should be established. In the 
case of states using existing publisher -provided, 
standardized tests, the publishers should be responsible 
for providing the report to the state for transmittal 
to the coordinating center. 

o Cooperating states should work toward the establishment 
of a common set of auxiliary information about student 
and school characteristics to collect along with testing 



5 



149 



data. A standard set of definitions for measuring the 
chosen characteristics should be determined. 

o As one of its activities, the coordinating center should 
consider ways of contextualizing the State test 
comparison data to mitigate against the possibility of 
unwarranted interpretations of comparative results. 

A critical caveat is that these recommendations relate to 
State testing systems that are changing sigaif icantly. We 
believe that these changes, toward testing more students, more 
grade levels and more subject matters, will facilitate the 
capacity of state testing systems to contribute to a fuller 
national picture of educational quality. 



APPENDIX 5 

SOURCES OF INFORMATION ABOUT STATE TESTING PROGRAMS 



1. Center for the Study of Evaluation, "Results from the Survey 
of State Testing Programs for the Quality Indicators Study," 
based on telephone interviews conducted November 12-26, 
1984. 

2. Southern Regional Education Board, Measuring Educational 
Progress in the South; Student Achievement . Atlanta, GA, 
1984. 

3. Roeber, E.O., "Large-scale Assessment Programs: Program 
Descriptions, Summer 1984," Lansing, MI: Michigan 
Department of Education. 

4. Roeber, E.D., "Survey of Large-Scale Assessment Programs: 
Fall 1983," Lansing, MI: Michigan Department of Education. 

5. Anderson, B. , "Status of State Assessments and MCT," Denver, 
CO: Education Commission t the States, March 14, 1984. 

6. Pipho, C, "State Activity: Minimum Competency Testing," 
(Contained in Anderson, March 1984), Denver, CO: Education 
Commission of the States, January 1984. 

7. Council of Chief State School Officers, "A Review and 
Profile of state Assessment and Minimum Competency Programs, 
1984." 

8. Pipho, C. and Hadley, C, "State Activity Minimum Competency 
testing as of December 1984," Denver, CO: Education 
Commission of the States. 

9. Anaerson, B., "Current Status of State Assessment Programs 
as of December 1384," Denver, CC: Education Commission of 
the States. 



0 

ERJC 



151 



APPENDIX 6 



1 ^ 

-I- J 



ERIC 



QUALITY INDICATORS SURVEY SUMMARY 



Page 1 



General Characteristics 



STATE 



TESTING Used For: No. of ASSESSMENT PROGRAM: ^ or 

Have State Co»pet*ncy/ Testing Areas Tested: Grade Selection Naae of Source of Itea* Snared Planned 

Projraa Assesswent Proficiency Programs Reading. Math, Other Levels Census Saaple Test Internal External Iteats thanues 

Y H ~ ^ ^ ^ i r i 1 nr v" 



AL 
AK 
AZ 
AR 
CA 
CO 
CT 
DE 
FL 



Y 

• • • 



.t. 

Y{L)** 



. yd. j 

Y 

Y{L) 



Y * 

• • ■ 



15 J 



*Tested every 2 years 
**Local Option 



.1. 

2 
2 



• ■ ■ «R »M • ■ • 



R.M.O 



R.M.W.0 



R.M.O 

R.M.W.O 

R.M.H 



1.2.4.5. 



4.8t 

M2 
4.7.10 



3.6.8. 
10.12 



4.8,11 

• • • • 

1-8.11 
3.5.8.10 



New tests In 1985 
"♦♦New tests this year 



SRA 



CAEP 

CT B S ** 
SSATT 



.t. 
.X. 



? 

• • • 
N 

• • • 
N 



♦♦♦♦♦According to the state testing Sector the Florida 
assessment and competency tests are the saae. 



o 

X 

Y **** °* 



9 

ERIC 



BEST COPY AVAILABLE 



QUALITY INDICATORS SURVEY SUMMARY 



Pa&e 1 



General Characteristics 



STATE 

GA 
HI 
ID 
IL 
IN 
IA 
KS 
KY 
LA 



TESTING Used For: No. of ASSESSMENT PROGRAM: 

Have SUte Competency/ Testing Areas Tested: Grade Selection Nam of 



Soiree of I teas 



ERLC 



Proarui Assessment Proficiency Programs Reading, Hath. Other Levels Census Sample Test Internal External 
T N T N Y N C 5 1 — 1 



N 

Y 
Y 



H 

Y 



.V. 



Program to start In 1985 



155 



.i. 
i 



R.N.O 
R.H.W 



R.M.W.O 



R.M.W.O 

.R.K.R.. 



1-3.6. 
8.10 

2.4.6, 
8.10 



4.8.U 



3.5.7. 
IP... 



.P. 
C 



BEST COPY AVAILABLE 



SEVERAL, .'TBS 
SAT 



....PIPS. 



I.E 
I 



156 



Major 
Stored Planned 
Items Changes 



QUALITY INDICATORS SURVEY SUMMARY 



Page 1 



General Characteristics 



STATE 


TESTING 
Have 
Program 


Used For: 
State Competency/ 
Assessment Proficiency 


No. of 

Testing 

Programs 


ASSES91ENT PROGRAM: 

Areas Tested: 
Reading, Math. Other 


Grade 
Levels 


Selection 
Census Sample 


Name of 

Test 


Source of Items 
Internal External 


Shared 
Items 


MJor 
Planned 
Changes 




V N 


Y N 


Y N 








c r 




I 


L 


V U 


V k 


ME 


y 


Y 

• • • 


N 


1 


R.M,W,0 


4.8.11 


C 


- 


E 




Y 


Y 


MD 


Y 


Y 




2 


R.M.O 


3,5,8 


C 


CAT 






N 




MA 


Y 


H a 


















• • 


. . • 


MI 


Y 


Y* 




2 


R,I1.W,0 


4.7.10 


c 








• • 
N 




MN 


Y 


Y 

■ • • 




1 


R,M,W,0 


4.8,11 


s 








• • 

Y 




MS 


Y 


Y 

• • • 


Y** 


2 


R,M,0 


4.6,8 


c 


CAT (old) 






• * 
N 




MO 


Y 


Y 




2 


R.M % 0 


6,12 


s 








• • 

N 




MT 


Y 


Y 

■ ■ • 




1 


R.M.O 


6,11 


V 








N 




Hi 


Y 


N 


Y(L) 


1 


















i5 < 




♦According to the state testing director, the NI assessment and competent Us are the same. 
**MS assessment progiam was dropped after 1983 and a new competency program Is being Implemented. 






• * 


T5 a 



9 

ERIC 



QUALITY INDICATORS SURVEY SUMMARY 



Page 1 



General Characteristics 



TESTING Used For: No. of 

Have State Competency/ Testing 



ASSESSMENT PROGRAM: 
Areas Tested: 



Grade 



Selection 



STATE Prograw Assessment Proficiency Programs Reading, Math. Other Levels Census Sam ple 
V H ^7 R Y N — 1 — c V 



Name of 
Test 



Major 

Source of Items Shared Planned 
Internal External Items Changes 
— i E TTT TT 



NV 

NH 
NJ 
NM 
NY 
NC 
NO 
OH 
OK 



N 

Y 

• • • 
N 



Y 
Y 
Y 
Y 



R.M.O 
R.M.W 



R,M.W,0 



3.5,8 

3.5,6 
• • • • 

1-3.6,9 



CTBS 
PEP 



CAT 



ERIC 



151) 



BEST COPY AVAILABLE 



160 



QUALITY INDICATORS SURVEY SUMMARY 



P*9« 1 



General Characteristics 



TESTING Used For: No. of ASSRSWENT PROGRAM: 

Have State Competency/ Testing Areas Tested: Grade Selection 



r 



Name of 



STA1E Program Assessment P roficiency Programs Reading, Hath, Other levels Census Sample Tes t 



Major 

Source of Items Stu »d Planned 
In ternal External Items Changes 

— i e rx Y N 



OR 
PA 
RI 

SC 
SO 
TN 
TX 
UT 
YT 



Y* 

• • • 

Y 

• • • 
Y 

• • • 
Y 

• • • 
N 



Y 

• • • 



Y 

• ■ • 



Y 

• • • 



• • • 



Y 

• • • 



Y** 
• • • 



Y** 

Y 



Y 

• • • 



Y 

• • • 

Y 

• • • 
N 

• • * 



2 

• • • • 



R.N.M 
R.M.W.O 



R.M.O 
R.K.O 



R.H.O 
R.M.W 



R.H.O 



4.7.11 

• • • • 

5,8.11 

4.6.8-10 
4.7.10 



2,3,5-8 
9-12 



3.5.9 
5.11 



ITBS 



CTBS-U 



Metropolitan 



CTBS-U 



I/E 
J 
E 
E 

E 
I 



.t, 

A. 

Y 

Y 

Y 
• • 



Ur to 1<35 have tested only every 4 years, will be annual shortly in 1985 at grades 3 ( &.tt,ll 
< "*New th's year 

•"According to state testing director the TX assessment I competency tests are the same. 



16^ 



9 

ERIC 



QUALITY INDICATORS SURVEY SUMMARY 



Page 1 



General Character sties 



TESTING Used For: 

Have State Competency/ 

STATE Program Assessment Proficiency 

V N R V H 



No. of ASSESSMENT PROGRAM: 

Testing Areas Tested: Grade Selection 

Program Reading, Math, Other levels Census Sample 

I T 



Name of 
Test 



Major 

Source of Items Shared Planned 
Internal External Items Changes 

— i e nr v n 



YA 
WA 

wv 

WI 
HY 



y** 



R,M P 0 
R.M.O 



R,M ( 0 
R,M,W,0*** 



8,11 



6,9,11 



8,11 



3.4,7, 



C/S* 



SRA 
CAT 



CTBS-U 
CTBS 



NAEP 



*Sjwple at grade 11 
**Now In development 
***Tested every three years 



163 



BEST COPY AVAILABLE 



let 



QUALITY INDICATORS SURVEY SUHHARY 



General Characteristics continued 



COMPETENCY PROGRAM: 

Areas Tested: Grade Selection 

STATE Reading, Math, Other Levels Census Swle 

t T 



AL 
AK 
AZ 
AR 
CA 
CO 
CT 
OE 
fL* 



R,M 



R.M.W 
R,M 



R.M.O 



3.6,9,11 



8.12 
3.4,1 



R.N.W 4.6-8,9 





R,M,W 

*Same as assessment program. 



3,5, 
8J0 



•8 C{4),S 



12 



C 



Name of 

Test 

ABCT.AHSGF 

Local 
CRT 
Local 

CRT 

... Ami.. 

SSAT 



Major 

Source of Items Shared Planned 
Internal External Itcas Changes 

i e nr y n 



IGo 



o 

ERIC 



QUALITY INDICATORS SURVEY SUMMARY 



Pige 2 



General Characteristics continued 

COMPETENCY PROGRAM: Major 

Areas Tested: Grade Selection Nawe of Source of I tew Shared Planned 

STATE Reading. Math, Other Levels Census Sawle Test Internal Externa 1 Iteas Changes 

1 S 1 E — T~T Y W 



GA 
HI 
ID 
IL 
IN 



R.M.W 
R.M.U 



R.M.W.O 



R,M 



R,W,H 



4.8, 

10-12 



3,9-12 
8 



3,6, 
8,10 



2-4. 
6,8,10 



2-5 



CRT # 
NWRL 



NEW 
LOCAL 



17 



Y 

• > . 



ERIC 



BEST COPY AVAILABLE 



166 



QUALITY INDICATORS SURVEY SUMMARY 



General Characteristics continued 



COMPETENCY PROGRAM: 

Areas Tested: Grade Selection 
STATE Reading, Math, Other Levels Census Sawp le 
~t 5 — 



Naw of 

Test 



Major 

Source of I teas Shared Planned 
Internal External Items Change s 
I E TTT V N 



ME 

MD 

MA 

MI* 

MM 

MS 

MO 

MT 

NE 



H.M.U 
R,M 



R.M.O 



R.M.U (new) 
a.M.O 



7.9 



4,7,10 C 



3.5. 
8, 11 



9-12 



R.M.W.O 5+ 
*Same as assessment pro gran. 



Local 



"Best Test" 



Local Option 



16/ 



ERIC 



QUALITY INDICATORS SURVEY SUMMARY 



Page 2 



General Characteristics continued 



NY 
NH 
NJ 
MM 

NY 
NC 
ND 
OH 
OK 



COMPETENCY PROGRAM: 

Areas Tested: Grade 



Selection 



Name of 



Major 

Source of liens Shared Planned 



STATE Reading. Math. Other Levels Census Saylc Test Internal External Items Changes 



R,M, 

M,0 

R,M,W 

R,M,W 

R,M,W 

P.M.W* 



R.M.W 



3.6. 
9-12 



4,8 

9-1 



10 



1,2 
6,9 



12 



T 



T 



,s 
c 
,s 
c 



CAT 

Local(new) 



Regents 



Local 



1/E 



"V (T Y k 



9 

ERLC 



'New In 1985 



BEST COPY AVAILABLE 188 



QUALITY INDICATORS SURVEY SUMMARY 



General Characteristics continued 



STATE 



COMPETENCY PROGRAM: 

Areas Tested: Grade 
Reading, Math, Other Levels 



Selection 
Census Sawple 
t V 



Nane of 

Test 



Major 

Source of I teas Shared Planned 
Internal External Item* Grtanges 
1 E — TT' TT 



OR 
PA 
RI 

SC 

SO 

TN 

TX* 

UT 

VT 



R,M 



R.H.O 
R,H,U 



R,M 



R.M.H 



R.M.W.O 



3,5,8 



8,10 

1-3,6, 
8,11 



9-12 

1,3,5,7, 
9,11,12 



1-12 



t:lls 



Life Skills 

Basic Skills 
Assessnent 



Local 



Y 

• • 

N 

• • 

N 



*Same as assessment program . 



ig:j 



ERIC 



BEST COPY AVAILABLE 



QUALITY INDICATORS SURVEY SuHMARY 



Page 2* 



General Characteristics continued 



COMPETENCY PROGRAM: 

Areas Tested: Grade Selection 

STATE Reading. Hath, Other Levels Census Sawple 

— I 5 



Na«e of 

Test 



Major 

Source of Items Shared Planned 
Internal External I tews Change 

— i e rr ~rnr 



VA 
UA 
UV 
WI 

WY 



R.H.O 



R.H.O 



R.H.W 



K-6, 

1.-12 



3,7,10 



Voluntary 



Local 



BEST COPY AVAI ABLE 



ERIC 



170 



APPENDIX 7 



Common Test Linking Issues 



Contents: 

Notes by R. Darrell Bock 

Letter to Leigh Bursteln, from Robert L. Linn 

Comments on Bock notes by J, Ward Keesllng 

Letter to Leigh Burstein, from Dale Carlson 

Letter to Leigh Bursteln, from Edward D. Roeber, Ph.D. 

Letter to Leigh Bursteln, fron. Tom Derlns, Ed. D. and Jack Fyans, Ph.D 

Letter to Leigh Bursteln, from Lorrie A. Shepard 



171 



U*ing data from the National Assessment of Educational 
Progress to link state assessment results 

R. Darrell Bock 

University of Chicago 

Kerch 1, 1985 



The National Assessment of Educational Progress (NAEP) as 
now conducted by Educational Testing Service can provide data 
Jiat would enable assessment results from many of the states to 
be expressed on a common scale. Scaled in this way, these 
results would could be used in comparisons of educational 
attainment among the states participating in such an effort. 
Because of relatively small sample sizes in some states, present 
NAEP data can be used only for national and regional reporting, 
and not for between-state comparisons. In most states, however, 
the NAEP samples are large enough to support the scaling 
procedures required to establish a common basis for comparisons 
among state assessments. 

The possibility of using the data in this way arises from 
NAEP's practice of assigning case numbers to each pupil's record 
in the public-use files. These case numbers are associated with 
corresponding pupil names and grades on rosters that are left in 
the possession of the schools where testing wr* carried out. 
(Pupil names never leave the schools.) For public schools at 
least, these rosters are presumably available to the state 
assessment programs, probably in the form of list prepared by 
the school that associates the NAEP case number with a 
corresponding state assessment case number. 

A basis thus exists for identifying pupils who have taken 
both the NAEP assessment exercises and the state assessment 
exercises or tests, and for locating the item responses of these 
pupils on the NAEP public-use tapes. In those states, such as 
California, that test all pupils in the state at certain grade 
levels (i.e., perform a complete census of the state), these 
joint results will be available routinely when the NAEP and the 
state are testing at the same grade level. In states that test 
in only a sample of schools, special provision would have to be 
made to supplement the fctate sample with the schools in the NAEP 
sample. 

That the NAEP testing is limited to grades 4, 8 and 11 will 
present difficulties, however, in those states that do not also 
test at these grade levels. Such states will have to arrange 
special administrations of their tests in the schools and at the 
grade levels of the NAEP testing. If, for example, a state 
system tests in sixth grade, that test would in most skill and 
content areas probably have sufficient range of difficulty to be 
successfully administered in the eighth grade for purposes of 
scaling. Even when there is no conflict of grade levels, 
differences in the time the tests are given during the school 
year may created a problem. If several months elapse between 
the NAEP and the state testing, special studies would have to be 



172 



carried out to establish rate of change of scores during the 
year as a basis for correcting the results to a common date. 
But in all the problems, changes in the conduct of state 
assessment to conform to NAEP practices would be a better 
solution. 

A more serious hindrance in the NAEP pract/ce is their 
policy of tef ing only biennially, and only in a few content 
areas at one time. Thus, NAEP tested in Reading and Writing in 
1983-84, and will test in Reading, Hath, Science, and Computer 
Understanding in 1985-86, and in Reading, Writing, Hath, and 
Science in 1987-88. Any attempt to 1/nk state assessment 
results using the NAEP data would therefore have to extend over 
a period of years, and even then might not include topics in 
state assessments outside these main areas. Nevertheless, the 
range of content in the complete NAEP cycle is broad enough to 
encompass the essential subject matter of primary and secondary 
schooling. Within the main areas, on the other hand, the NAEP 
exercise sets are large and varied and would probably parallel 
many exercises and items in the state assessments. Drawing 
these parallels in a comparable way in all of the participating 
state assessments would of course be essential to the linking of 
results. This problem is discussed below. 

Another aspect of the NAEP design that creates difficulties 
for the analytical methods of scale linking is the sparseness of 
the present matrix sampling design. Zn the 1 983-84 Reading 
assessment, for example, 139 items are matrix sampled in 
forms 9 each containing about 20 items. Zn any of the reading 
* uteres* sufficiently homogeneous to report in one score, any 
given pupil is presented only a small number of items, six to 
nine in most cases. As a result, any equating or scaling method 
that requires computation of scores for individual pupils will 
be impaired by the instability of scores computed from so fnw 
items. Zn particular, conventional linear or equi percentile 
equating of parallel forms, such as used in equating SAT forms, 
cannot be justified if, as is likely, the NAEP scores and the 
state assessment scores differ greatly in reliability. 

Only those methods that estimate scaling constants directly 
from the item responses, without calculation of intervening 
scores, are suitable for this typ9 of matrix sampled data. 
Fortunately, such methods are now available in item response 
theoretic (ZRT) scaling based on marginal maximum likelihood 
procedures introduced by Bock and Aitkin (1981). These methods, 
which require large samples of persons but not large numbers of 
item responses from any given person, are ; ^«ally suited to the 
analysis of matrix sampled ^ata. They are already used for that 
purpose by the California Assessment and for certain phase of 
the NAEP analyses. 

The vt riant of these methods that would ' ^ly in the present 
case is \ form of "old-test, aew-test" technique. Zt is assumed 
the* item parameters ?or the scale in question (the old test) 
hi .e been estimated in the NAEP national sample. These item 
: *rameter* are then used to compute the posterior distribution 
of the pupil's ability, conditional on his responding correctly 
or incot ^?tly to given items of the new test (comprised of 
items from Che same content domain in the state test). Zn the 
Bock-Ait kin marginal maximum likelihood met . M of estimating 



9 

ERLC 



173 



itea parameters, this distributioo is represented by posterior 
dtnsitits on a finite numbar of points for purposes of numerical 
integration during aarginalixation. Tha itea paraaeters of tha 
new tast ara estimated by aaxiaum likalihood f roa tha sum thasa 
conditional dansitias over tha saaple of pupils (which is 
assumed to ba larga). Tha calculations ara carriad out 
iteratively by tha so-called "EM algorithm" until stabla values 
of tha paraMtar estimates ara obtained. Thasa itea parameter 
astiaatas ara than us ad to compute scoras for pupils in tha 
staia sample, prafarably by tha Expected A Postariori (SAP) 
aathod (saa Bock and Mislevy, 1982). Tha Postarior Standard 
Daviations (PSD) of thasa scoras can ba intarpratad as standard 
arrors for pur post* of expressing thair pracisioa. 

Provided tha saaa prior distribution is aasuaad for purposas 
of aarginalization (a normal distribution with mean 500 and 
standard deviation 100, for exaop* »), tha EAP scala scoras 
coaputad froa tha data of diffarant statas will have tha saaa 
origin and unit and will thus ba coaparabla for purposas of 
statistical coaparisons between statas. 

Technically, this procedure is straightforward, 
computationally aff iciant, and statistically robust. Tha 
graatast difficulty in its iaplaaantation is tha concaptual ona 
of agraaing on coaaon contant doaains in which tha itaas froa 
tha participating stata assasaaants should ba claasifiad for 
purposas of constructing attainaant scalas. Tha itaa doaains 
auat ba assantially unidiaeasional, thay must corraspond to 
itaas in tha National Assassaant, and thay auat raprasant 
important araas of tha curriculua. A coaaon affort administarad 
by tha National Caatar for Education Statistics or tha Education 
Commit t Lon for tha Statas would obviously ba r squired to obtain 
ctr tenant on thasa points. An even bat tar arrangaaant would ba 
ona involving NAEP in which tha design of tha national 
assassaant is brought into battar accommodation with tha stata 
assassaants • 



References 

Bock, R. 0. & Aitkin, M. (1981). Marginal maximum likalihood 
estimation of item parameters: Application of an EM 
algorithm. Psychoaatrika, 46, 725-737. 

Bock, R. D. & Mislevy, R. J. (1982) Adaptive EAP estintion of 
ability in a microcomputer environment. Journal of Applied 
Psychological Measurement, 6, 431*444. 



University of Illinois 
at Urbana-Champaign 



210 Education Building 
13 10 South Sixth Street 
Champaign 
Illinois 61820-090 



Depertmtitt of 
Educational Piychology 



College of Education 



217 333-2245 



March 18, 1985 



Professor Leigh Bursteln 
Department cf Education 
University cf California, Los Angeles 
Los Angeles, CA 90024 

Dear Le1gl,« 

I think that Darnell Bock's description of * procedure f^r using 
NAEP to I1n<c s**te assessment results 1s Cw.. spUally sound and 
technically * as'ble. What he has described 1s a viable means 
of obtaining awe? better comparative Information than 1s currently 
possible provided stater and key federal agenci«« have sufficient 
Interest to cooperate. The main obstacles to successful Implementation 
of the system revolve around content specification, grade-level 
coverage, timing of state assessments, and the need collect 
and match state data for students 1n the NAEP sample. 

Agreement on content domains and the classification of 1te*ns 
from NAEP and each state assessment Into those domains 1s crucial. 
Th<» system cannot work without agreemen , A carefully coordinated 
e.rort among interested states, key federal agencies, and KAEP 
woul t>£ needed to achieve tL: degree of ccr.*ensus squired for 
Implementation and acceptance of the results. Ycur advisors 
who are directly Involved with state assessments could &ive you 
a better Idea of how feasible 1t 1s to accomplish this step. 

As tarrell points out, the additional data collection would be 
required 1n states where the state assessment does not match 
NAEP 1n terms of grade levels covered or the time of ><sur that 
data are collected. Resources obviously would need to be Identified 
to cover the axpenses of this additional data collection and 
analysis. Some cost estimates and maybe a pilot study 1n a couple 
of states would seem worthwhile. 

It wcjld also seem desirable to get a better Idea about th? extent 
of tie mismatch problem. You may already have this from your 
review of state practices, but a comparison of content covered, 
grades Included 1n the rtate assessments, and tine of testing 
would be helpful. We would also need to have a sense of the 
vlabllf* ;• of matching student data from NAEP with the state assessment 
results. 

I think the Idea has considerable merit. Perhaps the next step should 
be to see 1f any states have sufficient Interest to pilot test the Idea. 

Best regards, 




Roh*r^ l. L1nn 
Cha ' son 



175 



ERIC 



RLL/Jm 



COMMENTS ON DARRELL BOCK'S TEST LINKING PROPOSALS 
J. WARD KEESLING 



1. Does Darrell have an Idea of what numbers of Items and students 
would be needed to make valid comparisons (or precise comparisons) 
among the states? Precision might be easy (?) to determine. 
Validity may be a more subjective call. 

2. How many state*, would have enough students 1n G4, G8, or Gil with 
NAEP scores and state assessment scores to meet the criterion 1n #1? 

3. How many states could be added 1f they would augment their samples 
with NAEP schools? 

4. How many states could be added 1f a G6 state test could really do as 
well 1n G8 as a G8 test? How npny Items would have to .^me frcr? the 
same "domains?" 

5.. Ho* many states test at times not sufficiently close to NAEP tests? 

6. because most state assessments will Include reading, and because the 
(tern types may be like those used 1n NAEP, this would be a good 
place to try a test case. 

7. If at least 15 states can be found with reacing assessments 1n the 
right grades at the right time of year, this would be a good test 
case. Data should be available from NAEP for 83-84 and 85-86. 

8. In states planning to assess reading and/or math 1n 85-86, at about 
the same t1me-of-year as NAEP tests and 1n the same grades, start 
coordinating now to make 1t possible +o try Darren's Idea. 

9. Probably the most difficult part of this will be to Identify Items 
that truly belong 1n the same skill are* or objective across the 
NAEP and SEA tests. 

10. One could use the 83-84 data as a test case (probably only 1n 
reading, though). 

11. A test case, such as this, may be the only way to examine the 
precision of state-by-state comparisons, and mars projections about 
the numbers of people and Items needed to make useful contrasts or 
rankings. 



176 




CALIFORNIA STATE DEPARTMENT OF EDUCATION 


Bill Hontg 


rcl Uap/fO/ Ma// 


Superintendent 


Sacramento, CA 95814 


of Public Instruction 



April 2, 1985 



Leigh Bursteln 
Department of Education 
University of California 
Los Angeles, California 90024 

Dear Leigh: 

Please forgive my tardy respond to your letter of March 8. It has teen more 
than little hectic around here with the release of the grade 12 scores and 
the excitement surrounding the financial rewarding of schools for Improving 
their scores under the "Cash for CAP" program. 

I found Dr. Bock's summary of the proposed equating procedures consistent 
with our discussion last winter and as encouraging to me as when we first 
discussed them. My positive attitude rests on the moderately justifiable hope 
that we can have the best of both worlds— the manifold and manifest advant ^es 
of a "bottom up" approach to test content determination and credible state-oy- 
state comparisons. Those comparisons will be harder to generate than those 
from a "national test" and will require some additional qualifications for 
Interpretation, but the comparisons can be made. 

Some states do not now test at the "right" grade levels or the "right" time of 
year. The twc-cho1ce solution to these problems 1s totally compatible with the 
American philosophy of federal-state relations: (1) the remaining states will 
join the NAEP pattern, or (2) the NAEP grade levels, although se1e< ted on solid 
grounds, will be judged not to meet the needs of most states and districts and, 
therefore, ought to be changed. (A one-time break 1n the longitudinal 
comparisons could be accommodated by NAEP with no substantial Increases 1n 
testing time for that one year.) 

Similarly, NAEP's biannual assessment schedule does not t»em Insuperable. It 
means that new state tests could be calibrated, without additional testing, 
only every other year, tx would be nice to have annual state-national 
comparisoi s, but most of the states could still be compared on an annual 
basis. 

We are ♦-••tun.ti that Dr. Bock has developed such Innovative and powerful 
procedures to handle what would otherwise be 1n Intractable problem (I.e., the 
sparseness of ihe NAEP sampling design), thereby avoiding a complete redesign 
of NAEP s procedures. I hope that Dr. Bock's procedures can be put to the te;t 
under th'St circumstances, which are Just different enough from the California 
application to make them challenging. 



9 

ERIC 



Leigh Bursteln 
Page 2 

April 2, 1985 



A critical Issue, of course, 1s that of test content. Is there sufficient 
agreement among the states on the most fmport*..t content to be tested? I think 
so. The fact that the content focus 1s always changing complicates the 
process because the changes are not uniform across the states. But that 1s a 
small price to pay for the assurance of a timely and genuine content validity 
as 1t reflects the consensus of local concerns. I am looking forward to 
hearing more of the progress you are making 1n probing these Issues during the 
pilot study. 

To summarize, I think we are on the right track. I am biased, I admit. This 
"bottom up" approach to gaining agreement on test content 1s consistent with 
our efforts to design a comprehensive assessment system 1n California— one that 
provides the public with valid comparative Information reflecting core content, 
yet allows school districts to assess other objectives 1n sufficient scope and 
depth to meet their local needs. 

I hjpe your surveying and summarizing are going well. I am looking forward to 
seeing the results of your efforts later this spring. 



Since 




Dale Carlson, Director 
California Assessment Program 

(916) 322-2200 



DC:cc 



9 

ERIC 



178 



STATE OF MICHIGAN 




DEPARTMENT OF EDUCATION 



Lansing Michigan 48909 



NORMAN OTTO STOCKMEYER. SR. 



STATE BOARD OF EDUCATION 



President 
BARBARA DUMOUCHELLE 

*tce Pmtdeni 
BARBARA ROBERTS MASON 
Secretary 
OOROIhV BEARDMORE 
Treasurer 



PHILLIP h RUNKH 
Superintendent 
4* PuNic Instruutun 



April 10, 1985 



DR. EDMUND F V\NDETTE 
N4SBE Delegate 



CARROLL M. HUTTON 
CHERRY JACOBLS 
ANNETTA MILLER 



Dr. Leigh Bursteln 

Center for the Study of Evaluation 



GOV. JAMES J. BUNCHARD 

Ex-Offtao 



JCLA Graduate School of Education 
Los Angeles, California 90224 

Dear Leigh: 

As you requested, I am providing you my comments and reactions to 
the paper by Darrell Bock that you sent me. I an sorry that I will be 
unable to join you in Washington, D.C., April 15th and 16th, but I have a 
conflict with a meeting of our State Board of Education on those dates. 
My reactions to Darell'a ideas for using NAEP to link state assessment 
results are based both on my experience of directing the program here in 
Michigan, as well as having been a NAEP staff member in the late 60 f s and 
early 70"s. Therefore, I am familiar with NAEP, its objective and item 
development procedures and sampling deuign. 

NAEP has proposed a direct state-NAEP comparison for each utate (which 
if all states elected would allow state-to-state comparisons as well). I 
am opposed to it for Michigan because 1) the skills tested don't match 
Michigan objectives; 2) the skills were by «?nd large developed without the 
input of state departments of education curriculum specialists; 3) the range 
of dlffucoltles of items NAEP uses is purposely manipulated to produce a 
test with one-third very difficult ( - .1) items, one-third medium difficult 
(p « .5) items and one-third easy (p - .9) items In Michigan (and many other 
states), what is tested is what all students should know, regardless of the 
distribution of difficult or easy items; and 4) the cost of a state sample 
on NAEP is greater than or equal to that of testing all students at several 
grades in one subject area. Every-pupil data is far superior to sample data 
for Improving schools. Since we are trying to add another subject area 
to the every-pupil assessment progrrm here, cost is a very big item. 

I was hoping, when I proposed to use NAEP as an anchor test, that little 
additional NAEP- type testing would be needed. Hovever, Darrell states on 
page one of this paper that additional testing would be needed in states 
that only test in a sample of schools, which do not test students in grades 
4, 8 and 11 or which test at a different time than NAET's planned "spring" 
testing period of March-May. While Michigan tests all students, we test 
early September to early October in grades 4, 7 and 10. At least special 
bridge studies would be needed and perhaps it would be necessary to test 
samples of students in grade* 4, 8 and 11 in the spring each time NAEP tests 



are given. 



ERIC 



179 



Leigh Burstein 
April 10, 1984 
Page 2 



However, I do not see that Changes In the conduct of state assessment 
to conform to NAEP practices would be a better solution." I have cited the 
lack of conformance of skills tested, how cne tests arc built (NAEP never 
has specified that students ought to know anything they test), plus the 
very high cost of NAEP for just sample results. NAEP simply has limited 
utility in states that have strong state assessment programs. Since NAEP's 
purposes and methodology are different, it doesn't make sense to impose it 
on states. 

On the other hand, there are quite a few similarities among the states 
with strong assessment programs. It would make more sense to capitalize 
on the commonalities of these programs and Impose it back on NAEP. NAE? 
could collect, as one part of its data collection efforts, how the nation's 
students are doing on the skills that states think are most critical for all 
students to know in mathematics, reading and other area*. I believe the 
CCSSO proposal to develop a common core of competencies Is heading in this 
direction, although I don't belike that they make any mention of using NAEP 
to collect the data. 

Whila I understand that NAEP could be used to link state assessment 
results, cy feeling is that it isn't worth the costs, either financially 
or currlcular. I believe that whatever measure is used to compare the 
schools in Michigan with those of other states should first be defensible on 
the basis of content. I fear that if NAEP is used to link states and 
considerably more testing is needed, the foucs will be on NAEP performance, 
not state assessment results. I cannot defend the NAEP objectives as 
appropriate for all students here. Since the development of an adequate 
linking me sure will take time, I believe we should direct our efforts to 
more curricularly-defensible techniques, such as the CCSSO propoaal. 

I hope these comments will prove useful to you and the committee. 
If you wish for me to elaborate on any of the points I have made, please 
feel to contact me. I am sorry that my scnedule won't permit me to 

join you next week. 




Edward D. Roeber, Ph.D. 
Supervisor 

Michigan Educational 



Asses arte it Program 



EDR/pg 



ERIC 



* 



Illinois 

State Board of 
Education 



EDUCATION IS EVERYONE'S FUTURE 



3 



100 North First Street 
Springfield, Illinois 62777 
217/782-4321 



WaftarW Naumar. Jr. Chairman 
tthnom St§t§ Bo»rti of BOucstton 

April 11, 1985 



SUM Suptnnttna*nt of EOuctttor 



Leigh Bursteln, Ph.D. 
Center for the Study of Evaluation 
UCLA Graduate School of Education 
Los Angeles, California 90024 

Dear Leigh: 

We appreciated the opportunity to review the proposal by Darrell Bock which 
came with your March 8, 1985 correspondence. 

There are several questions which are raised 1n the issues discussed by 
Bock. These are: 

Which prior distributions will be chosen to generate the posterior 
densities 1n this aodel? Should the priors ' baseline Information 
from past NAEP assessments? Should the pr1* vary state to state 
or be set nationally? Furthermore, who should have the 
responsibility to decide what these priors should be? 

2) It 1s true that posterior density estimates of scores can be 
generated by the model presented by Bock. A lingering question 1s 
how well will scores produced by such a model represent the 
students from which they are derived? That 1s, how will the 
psychometric model presented by Bock Interweave with a sampling 
model to produce results proportionate to the number and ty>e of 
students spread out across the United States? Would tha posterior 
score estimates by Bock then be weighted by sampling parameters to 
produr » results to each state which would be useful to and 
repreiintatlve of that state. 

3) A related issue 1s that of the size of the population needed for 
this numerical Integration. It would appear that the requisite 
sample size for such Integration and maximum likelihood estimates 
would be large. As the number of educational domains and Hems 
therein Increase, the N required will also Increase. The need for 
certain leve's of total N for psychometric stability may militate 
against the needs for certain representative N by states discussed 
1n (2) above. 




MHV&toiniflh Southifrlll.no.»R«g,or»lO'f 1 «* 

§?™&«5? 80901 J f1i Su... 2 14. 1 23 South 10th Si 

312/793-2220 J W1 Mt Wwnon hi no* 02664 

"•"'Ji. 616/242-1670 

An t9*aneorrvtMtfAtfm*n* Action tmnurw 



irMl 



4) One major concern 1$ that of dimensionality. Will the Item 
response analysis find unl dimensionality (even wltn one domain) 
across the Items and many different types of students from 
throughout the United States? A major effort could be conducted on 
a state by state basis (of those states participating) to assure 
the relevance of the Items used with the curriculum taught In the 
state. It Is simply not sufficient to have NAEP define "Important 
areas of the curriculum. " Work by Hamlsch and others have sf^wn 
how the measurement models vary by currlcular differences among 
schools. 9 



5) One suggestion might be the adoption of a weighted collateral 

J?S?I?r at1on " odel of *** sort d1scusMd by Novlck and Jackson 
(1974). That Is, the data used for comparisons among states and 
for students would be a weighted composite of several components 
tapping the different level's In this analysis. Each component 
weighted by Its own general liability co-efficient. That Is, the 
itudents score would be weighted by the general liability of data 
at student level added to the state means weighted by 
general liability of data from that state, and combined with the 
overall national mean weighted general liability at the national 
level. We have attached an article which describes this process. 

5. On what basis can a claim be made that the NAEP tests "probably 
have sufficient range of difficulty?" We have not seen such 
empirical evidence. In Illinois, scaling of NAEP Items by Loglst V 
rive shown them to be restricted In their difficulty, usual N to 
unacceptably low levels. For example, thtr parameters of the NAEP 
Items were nuch lower In difficulty and dlscrimlnat m than those 
designed by our own staff and committees. In read..<g, for 
example, the NAEP Items were answered correctly by 801 to 90S of 
our students. 




ERJ.C 



182 



July 26, 1985 



Dr. Leigh Beriteln 
C»nte»- for the Study of Evaluation 
UCLA Sraduata School of Education 
Los Angeles, CA 90024 



Dear Leigh: 



I offer this letter as a Minority report 1n contrast to 
your conclusions from the State Assessment/Quality Indicators 
Project reflected 1n your letter to Emerson Elliott on 22 April 
1985. I believe that the State Assessment Consortium option 
which you are advocating 1s by far the most costly and 
potentially the most Intrusive 1n terms of local testing demands, 
despite state ownership. Let me spell out what I believe are the 
detractions to the State Assessment Consortium option. Then, I 
will consider the Standardized Tests model which 1s the most cost 
effective for certain limited purposes. Finally, I will argue 
for the feasibility of an "Expanded MAEP" in contrast to the 
equated State Assessments model. 

Obviously, the relative strengths and weaknesses of these 
options depend on the purpcse of the assessment. Is the primary 
audience to be policy makers at the federal level, who seek valid 
state-by-state comparisons of pupil learn. ng? Must the needs of 
state-level policy makers also be addressed? If so, will 
state-level decision makers be content with a summary report card 
comparing their state to other states and to the nation? Or, 
will they require more detailed, "1nstruct1onally diagnostic," 
information about relative strengths and weaknesses with<n broad 
subject areas? The latter, of course, requires a more sensitive 
measurement Instrument with concomitant Increases 1n cost. V to 
note that this latter type of comprehensive 1n-depth assessment ' 
is not 1n keeping with the usual connotations of the term 
Indicator." 



STATE ASSESSMENT CONSORTIUM 



I did not raise an" 'echnlcal objections to Darrell 
Bock s memo of March 1, li. . describing the procedure for 
linking state assessments via NAEP. Dr. Bock was very accurate 
in anticipating the number of special samples and special studies 
that would be required to Implement such a design. It was not 
his purpose to offer a cost analysis. (However, once one 
attaches reasonable numbers to each special provision, the cost 
implications are clear.) Committee members who favor this olan 
obviously value state ownership of the test content so highly 
that they believe the extra cost 1n warranted. 

C 2?I; If EVERY st * te 9«ve tests 1n che SAME CONTENT 
AREAS as MAEP. at the SAME GRADE LEVELS as NAEP, at the SAME TIME 
OF /EAR as NAEP, in precisely the SAME SAMPLE OF SCHOOLS as MAEP. 
and 1f the NAEP SAMPLES WERE ALWAYS LARGE ENOUGH, the linking of 



ERJ.C 



18;} 



state assessments would clearly be cheaper than an expanded 
National Assessment because the extra cost of the equating 
analysis would more than be off-set by the savings 1n test 
administration, I.e., no additional sampling or testing would be 
required. However, none of these Ideal matches are satisfied, 
hence, the need for expensive corrective strategies. 

If one wishes to have data for all 50 states, which 1s 
presumably essential for FEDERAL audiences, then the equivalent 
of an expanded NAEP 1s needed 1n those states without a state 
assessment AND 1n those states for whom current NAEP samples are 
too small. According to your survey, at least 12 state* do not 
have ANY state assessment or minimum competency tests. (I have 
excluded local district tests since these would require equating 
or anchor studies d1 str1ct-by. district. ) Many more statss are 
missing tests at one or more of the NAEP grade levels OR can be 
expected u iave too sparse a NAEP sample for equating purposes. 
Because NAEP selects a sample to be representative of an entire 
region the state samples are not necessarily large enough EVEN 
FOR EQUATING as Darrell pointed out. Smaller population states 
such as New Mexico, Nevada, Maine, Montana, Alaska, would likely 
require augmented NAEP samples. Thus, 1,i any kind of cost 
comparison the cost for these states would be roughly »omparab1e 
to the expanded NAEP design. 

Most states with testing programs test 1n reading and 
math and usually at at least two of the three school levels, 
elementary, middle, and high school. As Darrell has Indicated, 
whenever a state does not test at grades 4, 8, and 11, the sta* ; 
will have to arrange special administrations of their tests at 
NAEP schocls and at NAEP grade levels. Although I am willing to 
acknowledge that equating samples dc net have to be as large as 
assessment samples, I am assuming that 1n these cases of 

Ismatched grades 1t would not be possible to use the DATA from 
the regular state assessment only the TESTS. If the data from 
the next higher or next lower grade were used, 1t would require a 
statistical extrapolation of performance level that I do not 
believe 1s defensible politically. If ^ou are willing to live 
with such extrapolations, because they provide rough 'indicators" 
of the relative standing of state systems, fine; but then I don't 
think you should be so snobbish about nuances of content quality. 
Of course, 1f you don't extrapolate from the regular state 
assessments, then the NAEP-grade special administrations must be 
large enough to stand as the assessment samples. 

A feu more states, Connecticut, Illinois, Minnesota, 
Missouri, Oregon, Rhode Island, Tennessee, Utah, and Wisconsin, 
will require special state sampling if they do not already have a 

piggyback" arrangement with NAEP . Theie states test only a 
sample of pupils rather than every pupil at a grade level. 
Unless there has been a specific contract with NAEP (which was at 
one time true 1n Minnesota, I know) the NAEP sample 1s not likely 
to coincide with the state sample. Thus, the state will have to 
add iMEP schools to the state sample. 

Whenever state tests are not given at the same time of 
year as NAEP , special studies will have to be carried out to 
adjust performance to a common time. Now that NAE? 1s moving to 
a spring testing period (February - May), I expect this will be 
the least frequent source of difficulty. When they do occur, of 

EMC 184 




course, these studies are an additional expense. 

If the State Assessment model 1s put forward as the 
• MlJIll! SOlut10n ' 1t should be accompanied by a realistic cost 

INTRUSION. The equating plan relies heavily jn the 
cooperation of local school personnel (the principal and 

■ilfp'inlMS ;r!Ln Ulld l n ?ic Retrieving associated with 

Ji ?? can be done and ETS has had reasonable success doing so 
1n small-scale studies of their own. An equivalent ef'ort 1s 
required to match names to state IDs . Even 1f we a-e only 

J!?&?J B a iJ U «!i2 ay 0f *!• •"■'•tW'i time, and even If a 
battalion of field supervisors are hired ($$$) to see ft done 
properly, I believe there will be errors and .1ss1nJ data created 
Jt It! ne 9* tfye reaction. This 1s an unforeseen burden falling 
on those who agreed to be NAEP schools. 8 

ult 1 » a t.f; e rhl 0re l ntr " s1v « 1s the Implicit expectation that 
ultimately the costs of such a system will diminish as th* states 
ADJUST THEIR ASSESSMENTS TO THE NAEP DESIGN. (Dale clrfSjn 
mentioned 1n his memo that NAEP might also change to fit more 
popular grade levels. But, when you consider that the precise 
choice of grades 1s arbitrary and that there 1s no other more 

K!!J! 1, ?J *?* °, f 9ra ? es than *• 8 ' the Erection of 
conformity 1s clear.) It 1s Ironic that a plan that has state 

« n ?^ h iL? $ ^V^'pal attraction " 0uld havt $uch ".pllance 
but 8.! 111... 6t ?? ly "°, u l d w $tat ? s disrupt their own change data 
IStS w C r; 'lil 1 */!?!* be only one «H-POwerful federalist 
system. If you didn't like what this test said about you, there 
would be no other recourse. Whereas, a NAEP test would never b! 

aSo5; t ?h n I , , e . $ f: C l a I! y *I ° n 8 d1 ""«»t **y the heldllSe! Hire 
about the state test and progress over time. 

i * M ? T ALL STATES - Of course, my Cassandra-1 1ke cost 
50 * ?5 a9 ? er ! te J 1f you haye no intention of Including all 

lmt!88!; IU n$tead « y?u ncluded only 25 states who were 
interested, had large populations and their own extensive 
assessment programs, and fit the NAEP design at least 1n part, 

expanded NAEP to the federal government. Let us be clear, 
nowever, that such a plan would only serve state-level ooUcy 

;a a sr s i b t y ir°n 0 v J d ]„ n ?.i he : r ith n \ tionai »-p«^on; e¥ in p :iic5 

tJ !;«h?.li a J i#!KJ P f nt *? " e *5 y we are addr ess1ng such advice 
to Washington officials unless they see themselves as 
facilitators of state-level decision making. 

I do not believe that the documents circulated thus far 

U 1 lit lit VtltlV , !\J° S6e that the State AssessmeS? model 
is a not-all-states solution. 

"BOTTOM UP CONTROL OF CONTENT." There 1s a troubHna 
contradiction In believing that Individual state tests are 

Jlf! r i a K n ^ y . d ir i ferent e T 9h t0 J«t1fy the elaborate Unking 
design but similar enough to satisfy the requirements of IRT I 

iii?tJd e „ n ulh a :; h :5 l f/ ppl1cat1ons of , IRT "mr.??; lie 

limited number of items per subtest (4-5) was overcome by a 
total-test analysis (assuming Mn1 dimensional 1ty) but then users 
ITtVM to obtain differential diagnostic Information from ?hJ 
subtests. Dr. Bock has never been guilty of such foolishness. 

185 



Instead, he has advocated scaling of "Indivisible currlcular 
elements." Less sophisticated audiences are more likely to trust 
1n the magic of IRT *.nd believe that they can have their cake and 
eat It, too. 

Let's make 1t explicit that If a state has a unique 
content element that 1s not represented 1n the NAEP test, 1t 
cannot be equated. In essence, the grand scheme allows states to 
be ranked on their own Items that most resemble the NAEP content. 
It Is a fiction that their unique objectives can raise them on 
the NAEP ranking. 

READING, A SPECIAL CASE. Finally, the enthusiasm for the 
State Assessment model should be tempered by the warning that the 
equating strategy could work 1n reading and NOT 1n other 
subjects. Reading 1s not only the most universally assessed 
area, 1t 1s also the most uniformly def1n:d and best satisfies 
the unldlmenslonal 1ty requirements. 

STANDARDIZE TESTS. Nearly every school district 1n the 
cou ry administers standardized tests of .^Mevement. Only 
about five or six major batteries account for 90% of the market. 
One way to gather credible comparative data 1s to draw a 
representative sample of school districts 1n each state and to 
require (presuming a federal mandate) selected districts to 
report their aggregate scores by grade tested, sample size, t1m? 
of testing, and form of the test used. Normative standing for 
each district and then state could be averaged across grades and 
tests based on equivalencies derived from one national anchor 
study. Unlike the State Assessment model, separate equating 
studies would not have to be done 1n each state. Because the 
districts would supply the data and the anchor study would supply 
the conversion metrics, cooperation from the best publishers 
would not be essential . 

I would never advocate such a plan as a comprehensive 
1n- depth assessment. But, 1f what you want 1s an "Indicator" of 
relative state achievement, then 1t would be the cheapest but 
adequate model. The logistics of DISTRICT data collections would 
be more feasible than the pupil-level coding of the state Unking 
design. Furthermore, It would be easy to collect demographic 
Indicators at the same time. Any of these plans must make 
provision for assessing background factors (e.g., mobility, 
percent below poverty) against which achievement results are 
1 nterpreted. 

EXPANDED NAEP. An "expanded National Assessment" would 
Involve Increasing the current NAEP samples 1n most states to 
permit state-level results. If you believe the tests are narrow 
Instruir its or not as good as some state tests then the content 
could & so be expanded either by lobbying NAEP or by making 
agreements with a few states to share their Items. (If you 
really believe the NAEP tests are so bad, you should be lobbying 
ETS anyway.) The expanded NAEP model would be cheaper than an 
all -50- state Implementation of the State Assessment model. The 
most accurate cost estimates can be obtained for this design 
because the cost 1s directly tied to sample size and because ETS 
has already had experience with piggybacking and with the 
southern consortium. 



9 

ERIC 



As I mentioned then, the two objections to the NAEP 
solution are (1) the limitation of the tests, and (2) the 
political undoing of NAEP by making 1t a national test with 
authority. 

I believe you are being overly esoteric 1n criticizing 
the NAEP tests. Equally distinguished groups of subject matter 
experts were convened to create those tests as those 1n the 
respective states. And, as I Indicated above, 1f your criticisms 
are warranted, the right thing to do 1s change NAEP. In fact, 
however, I believe that only a few states can boast tests that 
are better" ( 1n terms of content coverage or Item quality, not 
Just better suited to their own needsV than the NAEP tests. 
Because of the matrix design, 1n fact, NAEP content domains are 
much more comprehensively assessed than 1n most state tests. Are 
you concerned that they don't test higher order cognitive skills? 
If you're right, these elements would be misslnci from the 
equating design, as well. 

If you are worried about NAEP's political future, 
consider that with the move to ETS, NAEP has already abandoned 
Its character as the dull monitor of an achievement time series. 
The NAEP staff have promised to deliver a national report c*rd 
and are aggressively trying to m*ke the NAEP data as visible and 
useful (hence political) as possible. Furthermore, your state 
assessment model with Its dependence on NAEP and Us evolutionary 
adaptation to the NAEP design will eventually give the NAEP tests 
the authority you seek to avoid. The dozen biggest states might 
be likely to keep their own assessments, but 1f the State 
Assessment model were fully 1n place, one wonders 1f smaller 
states w^uld be motivated to maintain their own assessments 
Instead of adopting the NAEP tests as well as the NAEP schedule. 
When you come right down to 1t, 1t 1s the largest states with 
visible assessment programs for whom the ownership issues are the 
most salient. Smaller states might prefer the NAEP design to the 
expensive Unking system. 

Please find an appendix somewhere for my contrary 
opinions. 




torrle A. Shepar'd 
Professor 



sm 



187 



APPENDIX 8 



7/1/85 



SUMMARY OF DOCUMENTS PROVIDED BY STATE TESTING PROGRAMS 
STATE TESTS AS QUALITY INDICATORS PROJECT 



STATE CODE* REPORT TITLES 

Alabama St High School Graduation Examination State Report: Reading 

St Basic Competency Testing Program State Report: Reading 
(Grade 3) 

St Chief State School Officer Summary Report: California 
Achievement Test, 1977 Edition 

Alaska C Alaska Statewide Student Assessment Program. Reading 
Skills Objectives: Grade 8. Field Review Edition 

C/T Portland Developmental Items. Mathematics: Grades 4 & 8. 
Reading: Grades 4 4 8 

T Statewide Achievement Test 1n Reading and Mathematics: 
Grade 4 

St Report on the 1981 Alaska Statewide Assessment Tests 

St Alaska Statewide Student Assessment: A Comparison of the 
1977 and the 1979 Assessment Results 

St Results of the 1983 Statewide Assessment Tests 

M Alaska Instructional Diagnostic System: Pilot Test Results 

M AIDS - An Evaluation of the Use of AIDS by Teachers 

TM AIDS - Skill Sheets Reading (General W, nnatlon) 

C/YM Structural Analysis (Skill Survey Sheets, Reading) 

TM AIDS - Lower Level Skill Surveys (General Information) 

M AIDS - Overview 

TM MDS - Upper Level Skill Surveys (General Instructions) 

M AIDS - Workshop Overview 

T AIDS - Student Booklet(MaU^natlcs). Upper Level Skill Surveys 

C Cross-Reference Guide (Computational Skills & Alaska 
objectives am* Items Bank 



YEAR 
1983 
1983 

1983 

Jan. 76 

1982 



1981 

1983 
1978 



1977 



*Key to document attached at end. 




BEST COPY AVAILABLE 

188 



STATE CODE 
Arizona St 



REPORT TITLES 

Arizona Pupil Achievement Testing: Statewide Report 



Arkansas St/D Analysis 4 Interpretation of the Resi- ts of the Arkansas 
Norm-Referenced Testing Program 

Analysis 4 Interpretation of the results of the Arkansas 
Minimum Performance Testing Program 



St 

California D 
C 
C 



California Assessment Program: Four Year District Summary 

Survey of Bas* Skills: Grades 3 4 6. Rationale 4 Content 

Survey of Academic Skills: Grade 8. Skill Areas Assessed 
In Reading 4 Written Expression. Rationale 4 Content 

C Survey of Academic Skills: Grade 8. Skill Areas Assessed 
in Mathematics. Rationale 4 Content 

St'C/Tc/D Survey of Basic Skills: Grade 6 
Part I: Content Area Summary 
Part II: Program Diagnostic Displays 
Part III: Subgroup Results 
Part IV: Using Survey Results 

Part V: Interpretive Supplement and Conversion Tables 

St Student Achievement In California Schools: Annual Report 

P/D Profiles of School District Performance. A Guide to 
Interpretation 

C Test Content Specifications for the Survey of Basic Skills: 
Written Expression and Spelling, Grades 6 4 12 

Tc/Su Interpretive Supplement to the Report on the Survey of 
Basic Skills: Grade 6 

St Student Achievement In California Schools: Am.ual Report 
P/D Profiles of School District Performance 

C Test Content Specifications for the Survey of Basic 
Skills: Mathematics, Grades 6 4 12 

C Survey of Basic Skills: Grade 12 

Su Survey of Basic Skills: Interpreting Results, Grade 12 

C Test Content Specifications for California State Reading 
Tests: Grades 2,3,6,12 



YEAR 
1984 
1983-84 

1983-84 

1983-84 

1983-84 

March 
1984 

March 
1984 

1982 



1982-83 
1982-83 

1975 

1980 

1981-82 
1979-80 
1975 

1981 
1984 
1975 



- 2 - 

18y 



STATE CODE REPORT TITLES YEAR 

Conner /icut M How Testing U Changing Education 1n Conn. 1983-84 

P Mater i Remedial Standards for the 4th Grade 1984 

St Conn. Assessment of Education Progress 1983-84 

P Presentation on Conn. Assessment of Education Progress (CAEP) 

Program Update 1984 

M CAEP IV Grade 8 Objectives 1983 

M Teaching Thinking and Problem Solving 1985 

St Conn. Assessment of Educational Progress, Social Studies, 

Overview of the Assessments 1982-83 

Tc Conn. Assessment of Eductlonal Progress Summary 4 Interpretations 1982-83 

St Business 4 Office Education Brochure, Overview of the Assessment 1983-84 

St Social Studlces Summary A Interpretations 1982-83 

St Art 2 Music, Summary 2 Interpretations 1982-83 

St Slence, Summary A Interpretations 1979-80 

St Math, Gr. 11, Summary I Interpretations 1979-80 

T Conn. Basic Skills Proficiency Test, Math, Form B 1982 

T Conn. Basic Writing Skills 1n Language Arts, Form B 1982 

T Mathematics, Gr. 11, 8, 4 1979-80 

St Conn. Ninth-Grade Proficiency Test, Summary Report 1980-81 

St Corn. Basic Skills Proficiency Test Results 1984-85 

M Objectives and Standards for Testing Program 1985 

M How Testing 1s Changing 1n Conn. (Article) 1985 




BEST COPY AVAILABLE 

- 3 - 

ISO 



STATE 



CODE REPORT TITLES 



Delaware St Educational Assessment Program. Statewide Test Results: 
Summary Report 

P Delaware Educational Assessment Program: Profile Report 



Florida 



Georgia 



Delaware Educational Assessmei. 
Item Report 

Group Right Response Report 



iram: Individual 



St 
St 
C 

St 
C 

St 

Tc 
Su 



Delaware Educational Assessment Program: Statewide 
Testing Results 

SSAT One Results. Student Assessment Test, Part I. 
Grades 3,5,8 

Item Specifications for the State Student Assessment Test, 
Basic Skills 

SSAT One 4 Two Results. Grades 3,5,8,10 

M1n1irum Student Performance. Standards for Florida Schools. 
Grades 3,5,8,11 (Reading, Writing, Mathematics) 

State, District, 4 Regional Report of Statewide Assessment 
Results 

Technical Report 

Statistical Supplement to the Technical Report 



C/Tc First, Fourth, and Eighth Grade Criterion-Referenced Test: 
Objectives and Assessment Characteristics 

T/C/St Student Assessment: Criterion-Referenced Tests and Basic 
Skills Tests (Content and Results) 

C/Tc Criterion-Referenced Tests (Mathematics and Reading Tests): 
Objectives and Assessment Characteristics for Third and 
Sixth Grade 



YEAR 
1983-84 



March 
1984 

March 
1984 

March 
1984 



1983 



October 
1983 

1985 



1932-83 
1985 

1983 

1983-84 
1983-84 
1983 

1983-84 

1984 
1984 



- 4 

191 



ERIC 



STATE 
Hawaii 



Idaho 



Illinois 



CODE REPORT TITLES 

IN Teacher's Handbook on Essential Competencies (Draft) 

St Summary Report of Statewide Testing Program 

St Summary Report of Statewide Testing Program 

M Graduation Requirements and the HI State Test of Essential 
Competencies (HSTEC), effective 1983 

T Idaho Proficiency Test: Mathematics, Reading, Spelling 

C/M Proficiency Testing Program 

Su Interpretive Guide to Computer Printouts 

TM Test Administration Manuel 

St Report on Idaho Proficiency Test Results 
St Summary of the 1982 Mathematics Results of the Illinois 
Inventory of Educational Progress 

Student achievement 1n Illinois: An Analysis of Student 
Progress 



Indiana 



St 

St 
T 

T 

C 

D 

Su 



School District Organization 1n Illinois 

The Illinois Inventory of Educational Progress: 
Grades 4,8,11 

The Illinois Inventory of Educational Progress: 
Grades 4,8,11 

Curricular Analysis of the 1982 Mathematics Results of the 
Illinois Inventory of Educational Progress 

Student Achievement 1n IL: An Analysis of Student Progress 

Design Specifications, the law, draft of questions/answers, 
and other related papers. 



YEAR 
1983 
1984 
1983 
1982 

1981 

1982 

1984 

1983-84 
1982 

1982-83 

1985 
1982-83 

1985 

1982 

1985 

(began 
Feb. 85) 



- 5 - 

8EST COPY AVAILABLE 



q X 9 2> 

ERIC 



STATE CODE 
Kansas St 

St 

C 

Sc/D 
St 

Kentucky St 
Louisiana C 

T/TM 
St 

Maine St 
St 
Tc 

Maryland St 
Tc 

St 

IN 
St 



REPORT TITLES 

Kansas Minimum Competency Testing Program Report (Rating 
Scales) 

Report of Research Findings: The Kansas Competency Testing 
Program 

Kansas Minimum Compe-^ency Objectives 
Identifying Minimum Skills 

Kansas Minimum Competency Assessment Report: Reading and 
Mathematl cs 

Comprehensive Tests of Basic Skills. Statewide Testing 
Results: Grades 3,5,7,10 

Louisiana Basic Skills Testing Program. Language Arts 4 
Mathematics Item Specifications: Grade 2 (1981), Grade 5 
(1984-85), Grade 4 (1983-84), Grade 3 (1982-1983) 

Louisiana Basic Skills Testing Program. School Test Coordi- 
nators Manual: Grades 2, 3, 4, 4 5 Bislc Skills Tests 

Basic Skills Testing Program. Annual Report: Grades 2,3,4 

Assessment of Educational Progress: Reading & Language 
Arts Results. Grades 4,8,11 

Maine Assessment of Educatljnal Progress: Reading 4 Language 
Arts. Summary 4 Interpretive Report, Grades 4,8,11 

Maine Assessment of Educational Progress: Reading 4 Language 

Arts. Technical Report, Grades 4,8,11 

Facts About Maryland Public Education. A Statistical 

Facts About the California Achievement Test, Maryland 
Functional Reading Test, 4 Maryland Mathematics Test. 

California Achievement Test Results: Grades 3,5,8. 
Maryland Functional Reading Test: Grades 9-12. Maryland 
Functional Mathematics Test: Grade 9. 

ProjeC Basic Instructional Guide: Volumes Y 4 VI. Functional 
Mathematics 4 Functional Reading 

Maryland Accountability Testing Program: Annual 
Report 



YEAR 
1983 

1980 

1984-85 

1982- 83 



Spring 
1984 



1984-85 

1983-84 
1982 

1982 

1982 

1983-84 

1980-84 

1980- 84 

1981- 82 



1981- 82 

1982- 83 



BEST COPY AVAILABLE 



- 6 - 

193 



STATE 



CODE REPORT TITLES 



Massachu- Tc 
setts 



St 

Michigan St 
Su 
C 



Basic Skills Improvement Policy. An Implementation 
Evaluation of the Basic Skills Improvement Policy: Technical 
Appendix. 

Basic Skill Improvement Policy. Statewide Summary of Student 
Achievement of Minimum Standards 1n the Basic Skills of 
Reading, Writing, & Mathematics 

Mathematics Education Interpretive Reports: Grades 4,7, 1C 
MEAP Support Materials for Mathematics 

Minimal Performance Objectives for Mathematics 5 
Communication Skills (Reading, Writing, Speaking/Listening) 



Michigan Tm 
(Cont. ) 

St 



Tc 

Minnesota T 

Tc/St/D 
T 



C/Tc/Su MEAP Handbook 

Coordination I Administration Manuel: Grades 4, 7, 10 
MEAP Statewide Results 
Technical Report: Volume I & II 
Minnesota Statewide Educational Assessment in Art 
Performance 1n Basic Mathematics 



St 
St 
T 

St 

Mississippi Su 



Minnesota Statewide Educational Assessment 1n Literature 
and Mathematics 

Results of a Statewide Assessment Program Utilizing the 
Minnesota Secondary Reading Inventories 

Results of Minnesota Statewide Educational Assessment 1n 
Music 

Statewide Educational Assessment 1n Reading 

Results of Statewide Educational Assessment 1n Social 
Studies 

Programs on Performance Testing Accepted October 18, 
1984. Literature Regarding this 4 Preliminary Facts 
Pertaining to the 11th Grade Tests. 



Missouri St Statewide Assessment Data Summary (Grade 12) 

Su Interpretive Report: Grade 12 and 6 

Su Educational Goals 

C Educational Objectives 



YEAR 
1983 



1981, 
1983, 
1984 

1980-81 



1984-85 

1984 

1983-84 

1980- 81 

1981- 82 

1979- 80 

1982- 83 

1982-83 

1980- 81 

1981- 82 
.J81-82 



Fall 
1983 

1976-77 



1982 



ERIC 



- 7 - 

BEST COPY AVAILABLE ^94 



STATE 

Missouri 
(Cont.) 



CODE REPORT TITLES 



C 

St 



Montana T 
St 



Nevada 



St 



New St 
Hampshire 

New Jersey St 

St 

T 

C 

TM 
T 

TM/Su 
New Mexico St 

D 

St 
St 
Si- 
New York Su 

TM 

TM 



Performance Indicators for Educational Objectives: Grades 
6 and 12 

State Assessment Data Summary 

Montana School Testing Service Test Booklet: Grade 6 4 11 

Results of 1984 Montana School Testing Service for Montana 
State Totals (Elementary 4 Secondary) 

The Nevada Proficiency Examination Program. A Brief 
Description and the Results of the 1983 Examinations 

Summary Report on Educational Assessment Program 

Statewide Testing System (New Jersey Public Schools) 
Minimum Basic Skills Test Results 

High School Proficiency Test: Grade 9. Statewide Results 

Statewide Testing System. High School Proficiency Test. 
Directory of Test Specifications 4 Items 

Statewide Testing c ystem. High School Proficiency Test. 
School District Guidelines 

Minimum Basic Skills Test: Grade 9 

Minimum Basic Skills Test: School District Guidelines 

Highlights of Results: High School Proficiency 
Exami nation 

School District Profile 
ACT 4 SAT Results 

Standardized Testing Program Report 
Dropout Study 

Regents Competency Testing Program (Information 
Bulletin) 

Regents Examinations 4 Competency Tests. School 
Administrator's Manuel 

Reading Test: Grades 3 4 6. Manuel for Administrators & 
Teachers 



YEAR 
19 74-75 

April 

1984 

1984 

1983 

1978,80 

Jan. 83 
1983-84 
1983-84 
1983-84 

1983-84 

1983-84 

1983-84 

Spring 
1984 

1982-83 

1982-83 

1982-83 

1982-83 

1982 

1983 

1984 



9 

ERIC 



BEST COPY AVAILABLE 

" 8 " . .,, 

1 9 j 



STATE CODE REPORT TITLES YEAR 

New York TM Mathematics Tests: Grades 3 4 6. Manuel for 1984 
(Cont.) Administrators & Teachers 

TM New York State Preliminary Competency Test 1n Reading. 1982 
Manuel for Teachers i Administrators 

T Writing Test: Grade 5 1984 

T Preliminary Competency Test 1n Writing 1984 

TM New York State Pupil Evaluation Program & Preliminary 1984-85 
Competency Tests 

St Grade 3 & 6 Reading and Math Test Results 1983 

Sc/D Regents Examination, Competency Test, & High School 1982-83 
Graduation Statistics 

North St Competency Test Program: Report of Student Performance Fall 

Carolina 1983 

St Annual Testing Program: Basic Sk111s. Report of Student Spring 

Performance Update from Spring 1981 to Spring 1984 1981-84 

Oregon St Oregon Statewide Assessment: Summary Report 1982 

Pennsyl- Tc An Analysis of Changes Across Time for Schools Participating 1978-81 

vanla 1n Educational Quality Assessment 1979-81 

I/C/St Educational Quality Assessment! EQA), Results from 1978-1981 1982 
Grades 5, 8, & 11 

I/C Getting Inside the EQA Inventory: Grades 5, 8, & 11 1982 

TM Testing for Essential Learning & Literacy Sk1l Is (TELLS ) : 1984 
Guidelines for Testing 

P/C TELLS: Guidelines for Remediation 1984 

Su/I/D Manual for Interpreting Secondary School Reports 1984 

Su/I/D Manual for Interpreting Intermediate School Reports 1984 

Su/I/D Manual for Interpreting Elementary School Reports 1984 

M PASCO Journal Spring 

1982 

Rhode St Statewide Assessment Program: Basic Skills Testing Results 1982-83 

Island and Life Skills Testing Results 1983-84 



BEST COPY AVAILABLE 

- 9 - 



196 



Tennessee C 
Tc 



STATE CODE REPORT TITLES 

South St Basic Skills Assessment Program. Cognitive Skills Assessment 
Carolina Battery: Preliminary Results 

St Basic Skills Assessment Program: Preliminary Report 

St Statewide Testing Program: Summary Report 

C Teaching and Testing Our Basic Skills Objectives (Reading): 
Grades 9-12 

M Measuring Educational Progress 1n the South: Student 
Achievement 

Proficiency Test Objectives. Their Domains with Sample 
Test Items 

Statewide Assessment Program: Basic Skills Executive Summary 
and Basic Skills Technical Report 

Texas St/Tc Assessment of Basic Sk1l Is. Part I: Project Report. Part 
II: Technical Report 

St Assessment of Basic Skills: Statewide and Regional Results 
as Reported 

C Assessment of Basic Skills: Reading Objectives, Writing and 
Math Objectives, and Measurement Specifications (Grades 
3,5,9) 

Utah D/Tc Educational Quality Indicators 

M An Analysis of Nation "Indicators of Risk" 

St Statewide Educational Assessment: General Report 

Virginia St Report on Public Education 
St Spring 1984 SRA Test Results 
St Minimum Competency Test Results 

Su Statistical Data on Virginia's Public Schools 

Washington St/D State General Report and District Level Summaries (reading, 
spelling, language arts, mathematics): Fourth Grade 



West 

Virginia 



St 15th Report ; State-County Testing Program 

T Student Questionnaire, Cognatlve Abilities Test, Level F, 
Form 3, Grade 9 J 



YEAR 

Fall 
1984 

Spring 
1984 

1984 

1983 

1984 

1983 
1982-83 
1982-83 
1980-83 

1986 ' 

1983 

1983 

1981 

1984 

1984 

1982-83 
Feb. 84 



Fall 
1983 

1982 



1984 



ERIC 



BEST COPY AVAILABLE 

- 10 - 



IS? 



STATE CODE REPORT TITLES YEAR 
Wisconsin St Pupil Assessment Program Report 1977-83 

Wyoming M Handbook for Establishing Minimum Coiroetency Programs in 1982 
Wyoming Schools 



- 11 - 

198 



KEY TO DOCUMENTS PROVIDED BY STATE TESTING PROGRAMS 



Code Categor y 

C Content Specification 

D District Summary 

G Group Right Response Reports 

I ' Item Report 

IN Instructional Guide 

M M1sc. 

P Profile 

SC School Report 

St Statewide Report 

Su Support Materials (e.g. "interpreting results") 

T Test 

Tc Technical Report 

TM Test Administration Manual 



199 

ERIC 



APPENDIX 9 



MASTER MATRICES 
FOR HATH, READING, MUTING* 



State: 

Grade: 

Soiree: 

Year: 

Test: 



SKILLS 



NUMBERS, 
NUftRATION 



MATH 

HIERARCHICAL PROCESS — > 





RECALL 


ROUTINE 




EXPLAIN 


PROBLEM 






MANIPULATION 




TRANSLATE 


SOLVING 










JUDGE 






(10) 


(11) 




(8) 


(2) 


0 


math facts 


compute: 


0 


computational 


o est, in word 


0 


count 


o Integers, 




estlnatlon 


problem 


0 


order 


o fractions 


0 


know when to 




0 


place value 


o ratios 




estlnate 


o haru word probs: 


0 


symbols /word 


o decimals 


0 


draw conclusion 


2-step, 2, 


0 


# line 


o I 


0 


10 assunptlon 


Interest, disct, 


0 


eqiriv. sets 




0 


select fact 


finance charges 




o expanded 


0 


sel. algorithm 




0 


equlv fract. 


notation 


0 


sel. qiestion 








o sequences 


0 


sel. problem 




0 


prape»t1es 


o factors/mult 




modeled 






of #s 


o rounding 












o simple word 








0 


identity 


problems 










elements 


o pos/neg # 









SKILL 
TOTALS 



(31) 



VARIABLES, 
RELATIONSHIPS 



GEOMETRY... 

SIZE, 

SHAPE 



(3) 

o facts, def, 
o synto. of alg. 



o laws of trig. 



(2) 

o def terms 
o recog. shape 



(4) 
o solve 
equalities & 
inequalities 

o read graphs 
o graph points/ 

lines 
o complete func- 
tion table 

(1) 

o find area, 
circumference, 
perimeter 
(simple) 



(2) 

o give equ. fc 
given info 



o Interpret 
formulas 



(2) 

o translate 
words into 
synbol, fig 

o how fig looks 
from other 



(3) (12) 
o solve probs w/ 
eqiations, trig 



o logic problems 
o graph problems 



(4) 

o geom prob solvg 

o show 2 shapes 

congruent 
o apply theorems to 

solve probs 
o draw diagrams to 

solve problems 



(9) 



9 

ERLC 



Used to categorize & count test Items & sibskllls. Each "o" Indicates a sibsklll. 
This list 1s fairly comprehensive but does not contain every sibsklll tested. 

2 l 0 BEST COPY AVAILABLE 



State: 

Grade: 

Source: 

Year: 

Test: 

MATH CONT. 



SKILLS 



tCAS'MT, 
Include 
maps, $, 
time, 
dlst, 
weight, 
temp., 
etc. 



RECALL 



(3) 

o def terms 
o equivalents 
o order 



ROUTINE 
MANIPULATION 



(3) 
o ccnpute 

o conversions 



EXPLAIN 
TRANSLATE 
JUDGE 

(3) 

o Identify most 
approp. unit 
to use 



o reading instru- o compare ants 
ments/measire 

o est. sz of 
cannon things 



PROBLEM 
SOLVING 



(2) 

o word probs w/ 
neasirement 

o estimate In 
word prob. 



SKILL 
TOTALS 



(11) 



(1) (3) (I) . (1) , (6) 

STATISTICS, o def of terms* o ccnpute mean, o Interpret o draw inferences 
PROBABILITY rode, median, data fran data 

range, etc. 

o organize data 

In table 
o compute proba- 

blllty 



TECHNOLOGY:* 

CALCULATIONS, 

COMPUTERS. 



synfc, terms 
flow charts 
Basic 



read flow 
chart 



calculator 
canputatlon 



appriy. time 
to use can. 
& computer 

nonroutlPf 
canputatlon 



calc. application 
solve probs 



ATTITUDE* 



COGNITIVE (19) (22) (16) (12) (69) 

TOTALS 

* Did not occur on any tests. 



P01 

BEST COPY AVAILABLE 

ERIC 



State: 

Grade Levels: 
Source of Infor: 
Year: 

Test Used: 

READING 
HIERARCHY LEVEL 

SKILLS RECALL LITERAL INFERENTIAL APPLIC'N SKILL 

COMP. EVALUATIVE TOTALS 

CCMPREHLNSION 



WORD ATTACK 



VOCABULARY 



(6) 

o phonetics 
o syllabication 
o affixes, roots 
o compound words 
o contractions 
o Inflectional 
endings 

(2) 

o meaning 1n 
Isolation 



(3) 
o meaning 
context 



1n 



(6) 



(2) 
o analogy 



(6) 



o signs 



o multi-meaning 



o nonsense 1n 
context 





(7) 


(20) 


(2) 


COMPREHENSION 


o details 


o dPtal ls/support 


o select best 


(note: 


o main Idea 


o main 1dea/sunmary 


X for 


content may be 


o t1 tie 


o title 


given 


reg. para or 
M l1fe skills," 


o referents 


o 1rrel/m1ss'g Info 


pirpose 




o missing words 




e.g. , ads, etc.) 


o sequence 


o sequence 


o apply Info 


o c^use/effecv 

o follow 
d1 red ons 


o cause/effect 
o conclusions 
o predictions 
o emot appeals 


to new 
situation 



o fact/opinion 
oA's purposes/a ttlt. 
oA's methods 
o analys character 
o figurative lang. 
j tone/emotion 
o contrast/compare 
o Identify org. used 
o setting/ plot/dialog 
o Identify 11 1, type 



BEST COPY AVAILABLE 



202 



State: 

Grade Level : 
Source of Info: 
Year: 

Test used: 

READING CO NT. 
HIERARCHICAL LEVEL 

SKILLS RECALL ROUTINE MAN IP. EVALUATE APPLICATION SKILL 

TOTALS 



(4) (1) (5) 

STUDY SKILLS o Use Info sources o identify which 

e.g., die/guide words source to use 

index/tab of c 

o Use card catalog 
o Use maps, charts 
o Alphabetize 



ATTITUDE 

COGNITIVE (8) (23) (2) (46) 

TOTAL 



ERIC 



BEST COPY AVAILABLE 

203 



State: 

Grade Levels: 
Source of Info: 
Year: 

Test Used: 



SITING 



SKILLS 



CONVENTIONS: 



RECALL 



GRAMMAR: 
(sentence 
structure) 



WORD USAGE: 



ORGANIZATION: 



(1) 

o Identify 
types of 
sentences 



LITERAL 
COMP. 



(7) 

o capitalize 
o pinctia -e 
o abbreviations 
o spelling 
o suffixes 
o plirils 
o contractions 

(5) 

o parallel structtre 
o complete sent, 
o compound, complex 
o sti)j/pred 
o parts of speech 

(6) 

o misplaced modifiers 
o language choices 
o sti)j/verb agreemt 
o transition words 
o dbl. negs 
o pronouns 

(91 

o eftectlve sent 

nrnnip 
o sequence words 
o sequence sent, 
o seq paragraphs 
o se>ct topic sent 
o sel Important detail 
o sel Info, to Include 
o letter format 
o fill out forms 



EXPLAIN 
IfFER 
EVAL 

(1) 
o when 
to ... 



APPLIC'N 
PROB SOLV 



(see write 
sample) 



SKILL 
TOTALS 



(8) 



(5) 



(6) 



(2) 
o judge 
writing 

o edi t re 
org'n 



(12) 



ATTITUDE: 



COGNITIVE 
TOTALS 

WRITING SAMPLE : 
Scoring: 



(1) 



Holistic 



(27) 



Primary Trait 



(3) 



(31) 



Analytic 



Point $yst: 

Number/Type of writing sample: letter, theme story, other 
Number of readers per sample: : , , » • 



BEST COPY AVAILABLE 



APPENDIX 10 



DECISION RULES 



Test Analysis: 

Math: for word problems-- 

NN-level 2: simple word problems Involving routine nanip. 
NN-level 4: hard word problems, 2-step problems 
G-level 4: geometry problems Inducing calculating area 

application, such as carpet or paint 
M-level 4: measurement problems other than the geometry ones 
above 



Note: some reports did not make clear distinctions between subskllls that 
were differentiated on the CSE matrices, such as the math example above. 
When tests were not available for analysis, 1t was necessary to rely on the 
categorization provided by the report. 



Summary* 

1. When tallying the number of Items for summaries, 1f report says a 
groups of Items Includes some falling 1n 2 or more levels of the 
hierarchy of skills, divide equally for purposes of counting Items. Be 
sure to count all subskllls represented. E.g., report says there are 
20 word problems Including some that are 1-scep (easy) and some 2-step 
(hard): count 10 Items 1n level 2 of the hierarchy and 1C Items 1n 
level 4. Also, count 1 subsklll 1n level 2 and 1 1n level 4. 



2. When report did not mention number of Items, only the number of 
subskllls rdght be countable. 

3. When report did not mention number of Items or even which subskllls of 
a skill area were tested, then only a check could be recorded for the 
subsklll Indicating that 1twas tested 1n some (unknown) fashion. 

4. Note that there are many ways of dividing or grouping skills or 
objectives, and that the subskllls used for classification purposes 1n 
this study are not necessarily "better" than some other scheme; they 
are just different and were useful for this study. 



205 



APPENDIX 11 



k * CATEGORIES OF "SOURCE QUALITY" 

1. TEST (e.g. Montana, Illinois) 

2. REPORT: straightforward, with clear, single skills and number of each 

type of lte.ns 

(e.g. Kansas, Louisiana, New Jersey, Missouri) 

3. REPORT: reasonably good Item specifications, however... 

a.) broad domains or clusters of skills that do not fit the 
subskllls 1n the Content-by -Sk1 1 1 -Hierarchy Matrix, so 
cannot assign exact number of Items per subskill even 
though the report 1s otherwise clear and may provide 
sample Items. 

(e.g., California, Maryland) 
b. no Info on nurtber of Items per subskill 
(e.g. , Texas) 

4. REPORT: 11st of "objectives* 1s too brief to be certain what Items 

really measure; does provide number of Items per objective, 
(e.g. Alabama) 

5. REPORT: list of objectives, skills or domains very brief or vague; 

altnough may give a few sample Items, report 1s not clear on 
what exactly 1s being measured. Does not provide number of 
Items per objective, 
(e.g. Pennsylvania) 

6a. REPORT: extremely vague or brief report mentioning only some of the 
skills nested, usually without Information on the number of 
Items on the test and without grade delineation, 
(e.g. New Hampshire) 

6b. INSTRUCTIONAL MATERIALS: vague as to what exactly 1s to be tested and 
nc Information on number of Items per subskill. 
(e.g. South Carolina) 

6c. LETTER; mentions test exists but gives not specific Information. May 
be a new program, 
(e.g. Virginia, Mississippi) 



NOTE: Some states provided different sources of Information on different 
tests or content areas. In this case, more thin one rating was 
given as appropriate. 



NOTE: The above 6-po1nt scale 1s ordinal only. 



BEST COPY AVAILABLE 

206 



APPENDIX 12 



Comments on Sources of Information 
and Quality of Information 



Rating 
5 

l;5 



State Co mment 
ALABAMA - (Rpt.) 



Gives brief objectives and number of items 
(can't teTT what Items are really like) 



ALASKA - 4th gr.test; 

8th gr. - (Rpt.) (skills mentioned but brief; no Information 

on exart Items) 



3a 



2a 



ARIZONA - (CAT) 

ARKANSAS - (Rpt.) - Report mentions appendix with 11st of 

objectives, but not sent to us. Report only 
lists major domains with # of subjects and 
Items each - so Isn't as helpful - can't tell 
how match our subskllls, I.e., what's really 
measured. 

CALIFORNIA -(Rpt) - (Broad domains ■ ours; can't assign #'s of 

Items per subsklll) 
- Good documentation otherwise; 12th grade * 
briefest; 8th grade most recent and best done 
re higher order skills. 

COLORADO - no program 

CONNECTICUT - no Info 

DELAWARE - CTBS 



2 - ma th 

3- recd1 ng 



FLORIDA - (Tech.) 
GEORGIA - (Rpt.) 



subskllls easier to Identify from report 1n 
math than 1n reading and writing. 

very brief title of objs. so can't be sure 
what's measured or # of Items. 



2 
1 



HAWAII - no Info. 

IDAHO - (Rpt.) - straightforward; # of Items 

ILLINOIS - (test) - note: marty Items are same on 4th and 8th - 

and on 8th and 11th 

INDIANA - (Rpt.) - Very brief H obj M with Item #'s - unsure what 

their M objs M really are. 
- Different Items for grades 6 4 8 but areas 
are same and same # of Items. 



BEST COPY AVAILABLE 



ERLC 



207 



IOWA - no prog. 

2 KANSAS - (Rpt. & - Straightforward with # of Items per obj. 

11st of 
objs) 

KENTUCKY - CTBS-U 

2 LOUISIANA - - Straightforward with # of items per obj. 

(Legis. Rpt.) - State assessment 

(Annual Rpt.) - Basic Skills 

2 MAINE - (Summary - Straightforward with # of items; extensive 

Rpt.) 1n : ormat1on on scoring of writing sample. 

5 MARYLAND - (Rpt., - Specs « OK, but lump together several objs. 

Specs.) under 1 domain - and don't give # of Items. 

MASSACHUSETTS - no single 

statewide test; 
local choice 

1 for R&M , t 

MICHIGAN - (Test) - Have writing objectives in Rpt. (?) - but no 
6c for Writing writing test -?? 

5 MINNESOTA - Confusing battery of tests 

Rept on MSRI No details on content for all tests - just 
Rept on MSEA-R brief "area" names which don't match ours 
Rept on MSEA-M well. Some Item #s given. 
Rept on Basic Math. 

2 MISSOURI - (Data - Straightforward; gives #'s of Items 

Summary) 

1 MONTANA - (Test) 

6c MISSISSIPPI (Rpt.)- New program with no Information on content 

other than RMW, grade levels. 

NEBRASKA - no prog. 

2 NEVADA - (Rpt.) - Fairly straightforward "competency areas"; 

gives #'s of Items. 

6a NEW HAMPSHIRE - Vague : didn't give specific information on 
(Summary Rpt.) objs? or Items and didn't differentiate grade 
levels by skill areas. Some areas and Hems 
mentioned 1n discussion of results (no list 
or tables, etc.). 

2 NEW JERSEY - Not real specs - but adequate for us. Gives 

( M D1r. of Specs # of items. 
* Items") 



BEST COPY AVAILABLE 



ERLC 



208 



NEW MEXICO - CTBS-U 



3a NEW YORK - Unique test of reading (Infer missing words 

(Manuals) in prose passages). Math part of manual 

gives # of Items 1n vnnuus content areas but 
uses different categories from ours - so 
can't assign # of Items tc our subskills. 

5 NORTH CAROLINA - Brief objs. only, no elaboration on content 

(Rpt.) or # of Items - a little hard to match to our 
categori es/h ierarchy . 

NORTH DAKOTA - no prog. 

OHIO - no prog. 

OKLAHOMA - no prog. 

3a OREGON - Last few pages give # of Items * but hard to 

(Summary Kpt) make their categories match ours - theirs are 
large and vague, e 3., "Inferential comp.", 
"evaluative comp." 

5 PENNSYLVANIA 

( M EQA W Manual) - gives only brief name of Item content - so 

tallies are tentative, especially on reading. 
( "TELLS M Booklet) - Information only on "objs. H - and brief 1 no # 

of Items 

RHODE ISLAND - ITBS 

6b SOUTH CAROLINA - Seems to be Instructional manual, not specs 

or test manual; also, only covers 9-12 
("Reading TAT whereas test 1s done at 1-3, 6, 8, 11 - and 
9-12") also only reading , whereas test covers R,M, 

and W. Gives only areas of R tested, not # 
of Items, and nothing on Math or writing [not 
very useful]. 

SOUTH DAKOTA - no prog. 

5 TENNESSEE - (Rpt) - Gives obj. and some [sample?] Items each, but 

doesn't specify # of Items on test. 

3b TEXAS - (Rpt.) - Gives reasonable, good specs and details on 

how specs and items written, but there are 
few objs covered; no Information on # of 
Items* 

UTAH - CTBS-S 
VERMONT - no prog. 



BEST COPY AVAILABLE 



ERLC 



209 



VIRGINIA - (Rpt. ) - Mentions there is minimum competency test in 

R&M at grade 10 - but gives no other 
inf orma tion! 



WASHINGTON - CAT 

WEST VIRGINIA - no prog. 

WISCONSIN - (Rp" v s not listed. Only a few could be 

inferred from Rpt. # of items given only for 
whole test and "lit ccmp." subset. 

WYOMING - no program. 



21 0 



APPENDIX 13 

STQI PROJECT 
Reading 

Definition and identification of Skills 
CONTEXT x SKILL HIERARCHY MATRIX: 





RECALL 


LITERAL COMP 
ROUTINE MANIP 


INFER, JUDGE 
EXPLAIN 


APPLIC'N 


WORK ATTACK 




no items 


no items 


no items 


VOCABULARY 








no items 


COMPREHENSION 


no items 








•STUDY SKILLS 


no i tetii: 






no items 



SAMPLE ITEMS 



RECALL / WORD ATTACK : 

1. PHONETICS Look at the picture and the word under it. The 

word has missing letters. Choose the letters that 
are missing in the word. 

* a. squ (picture of 

b. spr squirrel) 

c. thr 

d. shr irrel 



SYLLABICATION Look at the underlined word and select the response 

in which the word is correctly broken into 
syl lables. 

satisfaction 

a. sat-U-fact-iun 

b. satis-f action 
* c. sat-is-fac-tion 

d. sa-tis-faction 



3. AFFIXES 
4 ROOTS 



The root word in narrowing is: 

* a. narrow 

b. rowing 

c. arrow 

d. row 



ERLC 



211 



4. CONTRACTIONS 



Which words mean the same as the underlined word? 



You'll need an umbrella today. 

a. You all 

b. You would 

c. You still 
* d. You will 



5. INFLECTIONAL Which underlined word shows that something happened 

ENDINGS in pa it? 

When Eleanor arrives , you should show her the mural 
a. 5"! c. 

you painted . 
* d. 



RECALL / VOCABULARY: 



1. MEANING IN Which word means about the same as NOVICE? 

ISOLATION 

a. curator 

b. spendthrift 

c. weakling 
* d. beginner 



2. SIGNS What does this sign mean? 

a. don't enter (stop sign) 

* b. stop your car or bike 

c. stop talking 

d. no cars or bikes allowed 



LIT. COMP. / VOCABULARY: 



1. MEANING IN Choose the word that means the same as the 

CONTEXT underlined word 1n the sentence. 

Each morning Bernard has his customary breakfast of 
oatmeal, toast, and juice. 

a. fancy 

b. special 
* c. usual 

d. strange 

BEST COPY AVAILABLE 



ERIC 



212 



2. MULTIMEAN I NG 
WORD 



Choose the meaning of the underlined word as it is 
used in the sentence. 



The snap has fallen off the collar of my shirt. 

a. to make a sharp, crackling sound 

b. a brief spell of cold weather 
c to snatch or grab suddenly 

* d. a clasp rn an article of clothing 



INFER / VOCABULARY : 
1. ANALOGY 



Choose the word that best fits the blank. 
SMALL is to LARGE as HIGH is to . 



a. tall 

b. tiny 



* c. low 
d. broad 



2. NONSENSE IN 
CONTEXT 



What 1s the best meaning for the underlined 
letters? 

Sue mras kittens and puppies. 



a. little 
* b. Hkes 



c. is 

d. softly 



ROUTINE MAN IP (LITERAL COMPREHENSION / COMPREHENSION' 



1. DETAILS (Given passage with explicitly stated detail .. .e.g. 

A shock victim's skin 1s cold and may be moist 
to the touch. Pulse 1s fast and often too faint to 
be felt at the wrist. Breathing 1s rapid and 
shallow, and the victim feels weak and dizzy. 

A person who 1s 1n shock 1s most likely to: 



* a. feel d1zi„' 

b. have a strong pulse 

c. feel warm 

d. take deep breaths 



A Correct answers are not marked with an asterisk 1n items where reading 
passages have been omitted. 



BEST COPY AVAILABLE 

213 



2. MAIN IDEA / (Given passage with explicitly stated main 

idea ..e.g. 

At first glance, the prairie resembles little 
more than a barren and lonely e.^anse of grass, but 
1n fact, the prairie 1s teenring with life. Among 
the most Interesting Inhabitants of the prairie are 
the harvester ants. Named for their habit of 
collecting seeds, these Industrious Insects are 
well suited to prairie life, 
.•.(several more paragraphs about ants) 

What 1s the main Idea of this passage? 

a. Harvester ants are well suited to life 1n the 
prairie. 

b. Harvester ants' mounds are made of dirt. 

c. Harvester ants hibernate during the winter. 

d. A colony of harvester ants can collect a pint 
of seeds per day. 



3. REFERENTS (Given a pass^je, Identify referent of a pronoun or 

word that functions like a pronoun.) 

According to the story, who or what "sank slowly to 
the ground"? 

a. the mule c. the horse 

b. the goat d. the master 



4. SEQUENCE (Given passage, Identify explicitly stated sequence 

of events.) 

Which of the following happened last? 

a. Jefferson became a musician 

b. Jefferson wrote the Declaration of Independence 

c. Jefferson was elected President 

d. Jefferson designed his own home 



5. CAUSE / EFFHT (Given passage, Identify explicitly stated cause or 

effect,) 

Why did Linda stop in front of the house? 

a. She saw a kitten 

b. The children said the house was haunted 

c. The house was old and big 

d. She wanted to know what made the noise 



214 



6. FOLLOW DIRECTIONS (E.g., given application form, identify correct way 

to fill 1t out according to written directions.) 



On line 1, William should write the date on which 
he: 

a. left Ms previous job 

b. completes the application 

c. began .i1s first job 

d. 1s available for work 



INFER, EVALUATE / COMPREHENSION : 

1. DETAILS, SUPPORT (Given passage,) 
STATEMENTS 



Which statement best supports James Lee's claim 
that the late bus would benefit students? 



a. The school board should find a way to resume 
the services of the late bus 

b. L racurrlcular activities provide students 
with valuable learning experiences 

c. Some students can get rides from their 
parents 

q. Some working parents cannot take their 
children home from school 



2, MAIN IDEA, (Given passage, infer best title, summary 

SUMMARY, TITLE statement, title) 

The main Idea of these rules 1s that: 



a. both adults and children enjoy the swimming 
pool 

b. there 1s a snack bar at the swimming pool 

c. safety 1s extremely important at the swimming 
pool 

d. the swimming pool 1s open every day 



3. MISSING / IRRELE- (Given pass*?*. Infer missing Information or 
VANT INFORMATION Identify Important Information to Include or 

exclude) 

Which of the following would be most Important for 
the editors to 1nc1uo<* 1n this editorial? 

a. The school has never given the band any money 
for Its uniforms 

b. Helmets and padding protect football players 
from Injury 

c. Members of the marching band perform Indoor 
concerts too 

d. The football team has longer practices than 
the marching band 



BEST COPY AVAILABLE 



4. MISSING WORDS 



(Given reading passage with several words omitted, 
Identify best word to fit 1n blank from context.) 
(Note: New York's entire reading test was like 
this) 



5. SEQUENCE 



(Given a passage, Infers order of events or logic) 

What Indicates that Minnie was the first 1n her 
neighborhood to have a sewing machine? 

a. The neighbor women all came to see 1t 

b. She had to make everyone's clothes 

c. Fred bought 1t 

d. She didn't know how to operate 1t at first 



6. CAUSE / EFFECT 



(Given passage, Infer cause or effect) 

A major reason Paramount Studio moved to California 
was to - 

a. allow the Army to use the Astoria plant 

b. avoid the descructlon of the studio by 
vandals 

c. enable the Astoria plant to become a museum 

d. be able to make movies less expensively 



7. CONCLUSIONS 



(Given passage, chart, etc., draw conclusions) 

Based on the Information 1n this chart, 1t may be 
concluded that: 

a. cross-ventilation helps to warm a room 

b. gas heat 1s more expensive than electric heat 

c. fans use very little electricity 

d. Insulating walls conserves energy all year 
round 



8. PREDICTIONS 



(Given passage, predict probable outcome) 
What probably haopened next 1n this story? 

a. The girl became angry and went home 

b. Marina and the girl told each other their 
names 

c. The girl made fun of Marina 

d. Marina became embarrassed and stopped talking 



9 

ERLC 



216 



9. FACT / OPINION (Given passage or statement, distinguishes fact 

from opinion) 

Which of the following 1s an example of an opinion? 

a. "In 1860, a mldwestern stagecoach company let 
people know about an exciting new plan." 

b. "The nail must go through." 

c. "The route cut directly across from Missouri 
to Sacramento." 

d. "Each rider rode nonstop for about 100 
miles." 



10. PURPOSE, (Given passage, Infer author's purpose or attitude) 

ATTITUDE 

The author's attitude toward the Pony Express 
riders can best be described as one of 

a. confusion c. worship 

b. amusement d. admiration 



11. CHARACTER (Given passage, Identify character traits, identify 

motivations, draw conclusions about character's 
feelings) 

The beasts and birds can best be described as 

a. proud and closed-minded 

b. understanding and wise 

c. sleepy and lazy 

d. thrifty, hard-working 



12. FIGURATIVE (Given passage, Identify meaning of metaphor, 

LANGUAGE simile, Idiom, or other Image or figure of speech 

used) 

The author's choice of words "s.ts up business" and 
"cleaning station" are used to show that 

a. the wrasse's means of getting food 1s almost 
like a business service 

b. wrasse fishing 1s big business 

c. all fish set up stations 

d. the wrasse enjoys clean ng Itself 1n the 
water 



ERLC 



BEST COPY AVAILABLE 

217 



13. TONE 



(Given passage, recognize mood) 



At the beginning of the story, th» mood 1s one of 

a. d1 sappcl ntwent and sorrow 

b. curiosity and excitement 

c. fear and suspense 

d. thankfulness and joy 



14. COMPARE (Given passage, Infer similarities, differences) 

CONTRAST 

Compared to American managers, Japanese baseball 
managers are - 

a. better advisors 

b. better paid 

c. more knowledgeable 

d. more powerful 



15. ORGANIZATION (Given passage, selection portion to complete 

outline or organizer based on organization of 
passage) 

The following outline 1s based upon the last 
paragraph of the passage. Which topic below 1s 
needed to complete 1t? 

I. 

A. Federalists 

B. Republicans 

a. Competing parties c. Election pay-offs 

b. Jefferson's rivals d. Strong governments 



16. SETTING, PLOT (Given passage, Identify and Interpret time, place 

DIALOGUE of story or event) 

You can tell that this story took place 

a. in a city park c. 1n a forest 

b. at a zoo d. near a boot factory 



17. LIT TYPE (Given passage, recognize example of fiction, 

nonflctlon, biography, autobiography, similes, 
metaphors, etc.) 

The reading selection appear to be an example of 

a. an autobiographical account 

b. historical fiction 

d. ancient mythology 

y& 2i8 



APPLICATION / COMPREHENSION: 



1. RELATE TO NEW 
SITUATION 



(Given passage, relate ideas in it to situation not 
d1 scussed) 

Suppose your student council could not succeed in 
accomplishing any Improvements for the student body 
because of many conflicts and divisions among the 
student members. Which of the following would be a 
way of applying Thomas Jefferson's beliefs to such 
a situation? 

a* avoid the meetings so as not to waste time 

b. try to unify the members to create an 
effective coimdl 

c. encourage the disagreements to create 
livelier debates 

d. appoint one person to make all the decisions 



ROUTINE MANIP. / STUDY SKILLS: 



1. USE INFORMATION 
SOURCES 



(Includes dictionary entries and guide words, 
tables of contents, Indexes, glossaries, 
encyclopedias, phone books, and other written 
Information sources) 

(Given a dictionary entry...) 

Choose the definition that best fits how the 
underlined word 1s used 1n this sentence: 

I can't trim your hair with these dull scissors. 



a. v. 1 

b. v. 2 



c. n. 

d. adj. 



2. 



USE CARD 
CATALOG 



(Given title card. ..) 

Who 1s the author of Brother of the More Famous 
Jack? 



a. Black Swan 

b. Barbara Trapldo 



c. Victor Gollancz 

d. Transworld Publishers 



BEST COPY AVAILABLE 



ERLC 



219 



3. USE MAPS, 
CHARTS 



(Given maps, charts, etc., to locate specific 
information or answer questions) 

(Given chart of population of major U.S. cities.. 

Which city had the least change in population 
between 1970 and 1980? 



a. Philadelpnla 

b. Chicago 



c. Houston 

d. Los Angeles 



) 



4. ALPHABETIZE 



Choose the word that comes first in alphabetical 
order. 



a. solve 
* b. sob 



c. south 

d. sort 



EVALUATE, JUDGE / STUDY SKILLS: 



fnrni? BEST Wher ? " ould you 1ook to f1nd a list of all the 

50WCE presidents of the United States? 



* a. an encyclopedia c. a dictionary 
b. a newspaper d. an atlas 



ATTITUDE: i enjoy reading. 



a. strongly agree 

b. agree 

c. not sure 

d. disagree 

e. strongly disagree 



220 



MATH SAMPLE ITEMS 



Quality Indicators Project 
Math 



CONTEXT x SKILL HIERARCHY MATRIX: 





RECALL 


ROUTINE 
MANIP. 


EXPLAIN 
TRANSLATE 
JUDGE 


PROBLEM 
SOLVING 


NUMBERS, 
NUMERATION 


1. order 

2. number 

line 

3. Identity 








VARIABLES, 
RELATIONSHIPS 










GEOMETRY 










MEASUREMENT 










STATISTICS, 
PROBABILITY 


(empty) 
1 .e. , no 
test Items 




(empty) 


(empty) 



SAMPLE ITEMS* 



RECALL / NUMBERS : 

1. ORDER What shows the correct relation of 7,9, & 16? 

* a. 7 < 9 < 16 c. 16 < 9 < 7 

b. 7 > 9 > 16 d. 16 > 9 < 7 



Sample Items are presented for all cells 1n which test items occurred. 
Not every subsklll 1n every cell 1s represented here, but the most 
frequent and characteristic ones are. 



221 



RECALL / NUMBERS (CONT. ) 



2. NUMBER LINE 



What number 1s represented at point S on the number 
line? 

-1 0 1 
J I I I I I I L> 



* a. -1/2 
b. -1 1/2 



c. 1/2 

d. 1 



3. IDENTIfY ELEM. 



What value for "n" makes the sentence below true? 
100 - n « 100 



a. 0 

b. 0.01 



* c. 1 
d. 100 



4. MATH FACTS 



4x2- 

* a. 8 
b. 9 



c. 6 

d. 2 



5. SYMBOLS/WORDS Which number means "three hundred sixty-two"? 



a. 352 

b. 3620 



* c. 362 
d. 632 



6. EQUIVALENT 
FRACTIONS 



10/15 « 

a. 3/5 
* b. 2/3 



c. 1/5 

d. 3/2 



ROUTINE MANIPULATION / NUMBERS: 



1. COMPUTE: 

WHOLE NUMBERS 



79 + 34 » 

a. 112 

b. 45 



c. 103 
* d. 113 



ERIC 



222 



2. FACTORS, 
MULTIPLES 



Which shows the prime factorization of 12? 



* c. 3 x 22 
d. 2 x 3 x 2 2 



a. *3 x 4 

b. 1 x 12 



3. NUMBER 
SEQUENCES 



Which number 1s missing? 1011, 1022, 



, 1044 



a. 
* b. 



1043 
1033 



c. 1023 

d. 1020 



4. SIMPLE WORD 
PROBLEMS 



A basketball team has won Its first 3 games. It 
must play 12 games 1n all. What percent of the 
total games has the team played? 



* a. 251 
b. 32 



c. 33* 

d. 75X 



ROUNDING 



Round 0.4088 to the nearest hundredth. 



a. 0.40 

b. 0.408 



c. 0.409 
* d. 0.41 



6. CONVERT 
FRACTIONS, 
DECIMALS, I 



3/4 ' 

* a. .75 
b. .34 



c. 3.4 

d. 75.0 



EXPLAIN, JUDGE / NUMBERS 



1. FORMULATE 
PROBLEM 



JoAnn works 4 hrs a day for 4 days a week. She 
earns $4.25 an hour. She wants to earn enough 
money to buy a refrigerator for $585. 

Which problem cannot be solved with the Information 
given above? 

a. How much money does JoAnn earn each week? 

b. How many days must JoAnn work to buy the 
refrigerator? 

c. How much more money would JoAnn irn each 
week 1f she 1s paid $5.00 an hour? 

* d. What 1s the capacity of the refrigerator that 
JoAnn will buy? 



ERIC 



223 



2. IDENTIFY FACTS Joe bought a shirt that regularly sells for $24 on 

sale for $18. What percent off the regular price 
was the sale price? 

What facts are given? 

a. sale price and discount rate 
* b. sale price and regular price 

c. regular price and discount rate 

d. regular price, selling price, and discount 
rate 



3. IDENTIFY A packet of gelatin weighs 20 grams. What is the 

ALGORITHM weight of 10 packets of gelatin? 

Which of the following problems can be solved using 
the same operations as the problem above? 

a. Juanita runs 10 miles in 90 nrin. How long 
does it take her to run each mile? 

b. A felt pen costs 49$ and a ballpoint costs 
99t. How much does a felt pen and a 

bai lpoint cost? 

c. It takes 4 ounces of juice to fill a glass. 
How many glasses can be filled from a 

half -gal Ion bottle of juice? 
* d. A pencil costs 10$. What 1s the cost of 4 
dozen pencils? 



4. EVALUATE 

CONCLUSIONS, 
ASSUMPTIONS 



Magdelena got 80X correct on a math test and 852 
correct on a science test. Ralph said that 
Magdelena got more right answers in the science 
test than in the nuth test. 

Which of these conclusions about Ralph's statement 
1s correct? 

a. Ralph's statement is true under all 
condl tions. 

b. Ralph's statement cannot be true under any 
condl tlon. 

* c. Ralph's statement is true if the tests each 
have the same number of questions, 
d. Ralph's statement cannot be true if the tests 
each have the same number of questions. 



5. COMPUTATIONAL 
ESTIMATION 



Estimate the product: 89.61 x 10.42 

a. 9000 b. 1200 * c. 900 d. 100 



ERLC 



BEST COPY AVAILABLE 

224 



PROBLEM SOLVING / NUMBERS: 



1. ESTIMATE IN 
WORD PROBLEM 



The payroll of a grocery store for Its 23 clerks is 
$395,421. What 1s the average salary of a clerk? 

What Is the best estimate of the answer? 



* a. $20,000 
b. $17,192.22 



c. $20.00 

d. $1300 



2. HARD WORD 
PROBLEMS OR 
2-STEP PROBLEMS 



With 5 games to play, Steve had 1B7 hits. In his 
next four games, he got 1,4,2, and 3 hits. How 
many hits must he get In his last game to have a 
200-hlt season? 



a. 
* b. 



c. 10 

d. 13 



RECALL / VARIABLES: 



1. SYMBOLS 



Choose the symbol that makes the number sentence 
true: 



3+4 



□ 



8 



a. - 

b. > 



* c. < 
d. ■ 



ROUTINE MANIP. / VARIABLES: 



1. EQUATIONS, 
INEQUALITIES 



If x 1s replaced by 3, then the value of x 2 - lis 



a. 2 

b. 5 



* C. 8 
d. 11 



2. GRAPH POINTS 



The point F 1s named by: 



* a. (2,3) 

b. (3,2) 

c. (3,3) 
4. (2,4) 



1 2 3 4 5 



BEST COPY AVAILABLE 



ERIC 



225 



3. READ GRAPHS 



(given a 11 ne graph . . . ) 
In what year did the Tigers win 15 games? 
a. 1980 b. *81 c. '82 * d. '84 



EXPLAIN, JUDGE / VARIABLES : 



1. ORGANIZE DATA Put these test scores Into a frequency table: 

IN GRAPH, TABLE 

Sue scored 5, John — 6, Tony — 9, Sarah — 10, Tom— 6, 
Brad— 7, Jenny— 9, Kate— 6. 



2. EQUATION FOR John and Tom have 10 books between them. Tom has 2 

INFORAMTION more books than John does. 



Which pair of equations describe this Information? 



a. J + T « 10, J + 2 » T 

b. J » T » 10, J + 2 * T 

c. J + T » 10, J « 2 + T 

d. J - T » 10, J - 2 » T 



PROBLEM SOLVING / VARIABLES : 

1. GRAPH PROBLEM (given a line graph of number of arrests by 

years . . . ) 

During which 2 years were the same number of people 
arrested for drunken driving? 

* a. 1975 & 1976 c. '77 & '78 

b. '78 & '80 d. '77 & '79 



2. FORMULA Find the volume of the pyramid with a rectangular 

base using the formula V ■ Bh/3 (B » area of base, 
h * height) 



a. 30 cubic Inches 

b. 32 cubic Inches 

c. 90 cubic Inches 

d. 96 cubic Inches 



RECALL / GEOMETRY : 

1. DEFINE TERMS Which figure shows a ray? 

a. ' 5 * c. 

O b. ^ * d. 

ERJ£ 22G 



2. CONCEPTS 



This figure 1s a square: 

What 1s the measure of angle I? 



a. 30 

b. 45 

c. 60 
* d. 90 



ROUTINE MANIPULATION / GEOMETRY: 



1. AKtA, circum, 
PERIM, VOL. 
SIMPLE 



Find the area of the rectangle below; 



a. 25 m 

b. 42 m 

c. 68 m 

d. 136 n 



8 m 



17 m 



2. CORRESPONDING 
SIDES, ANGLES 



Given that the figures below are similar, the 
measure of F is the same as the measure of 



a. 
b. 
c. 
d. 



H 
M 
N 
0 



3. SHAPES 



The figure below shows the part of a figure on one 
side of a line of symmetty, m. Which answer choice 
shows the complete figure? 



c. 



JUDGE, EXPLAIN / GEOMETRY: 



1. 



SHAPES FROM 
OTHER VIEW 



Figure F below shows a block with one corner cut 
off and shaded. Which answer shows a figure of how 
this block would look when viewed from directly 
above it? 



a. 



c. 



ERIC 



F1g. F 



227 



2, ESTIMATE 



Estimate the size of the angle below. It appears 
to be between: 



* a. 0 and 45 

b. 45 and 60 

c. 90 and 135 

d. 135 and 180 



PROBLEM SOLVING / GEOMETRY : 
1. APPLY THEOREMS 



2. WORD PROBLEMS 



Which of the following statements 1s true about a 
square and a triangle both Inscribed 1n the same 
circle? 

a. The area of the square 1s greater than the 
area of the triangle. 

b. The square and the triangle have the same 
perimeter. 

* c. The arc of the triangle 1s greater than the 
arc of the square, 
d. The perimeter of the triangle 1s greater than 
the perimeter of the square. 

Robert must choose one of 4 £ol1d chocolate candles 
to buy. Which one of the following shapes will 
give him the MOST chocolate for his money? 

* a. Cube one Inch on a side. 

b. Sphere one inch diameter 

c. Cylinder one Inch 1n height and one Inch In 
diameter 

d. Pyramid one Inch 1n height with a one Inch 
square base 

e. Co * one Inch 1n height and one inch 1n 
diameter 



RECALL / MEASUREMENT: 



1. EQUIVALENTS 



How many Inches equal one yard? 



a. 30 

b. 35 



* c. 36 
d. 39 



2. ORDER 



Which month comes next after April? 

* a. May b. March c. June d. February 



228 



ROUTINE MANIPULATION / MEASUREMENT : 



1. READ INSTRUMENTS 



What time Is It? 

a. 3:00 

* b. 3:30 

c. 4:00 

d. 4:30 



2. CONVERSIONS 



One meter equals 

* a. 39.14 In. 
b. 36 In. 



c. 3 yards 

d. 41 In. 



EXPLAIN / MEASUREMENT: 



1. IDENTIFY BEST 
UNIT TO USE 



Which unit Is best for measuring the distance 
between two cities? 



* a. kilometer 
b. centimeter 



c. liter 

d. kilogram 



2. ESTIMATE 



Whi object would be about 4 meters long? 



a. bicycle 
* automobile 



c. shoe 

d. baseball bat 



PROBLEM SOLVING / MEASUREMENT: 



1. WORD PROBLEMS 
OBJECTS 



A map of a state Is to be drawn ?i that one-fourth 
Inch represents five mile?,. If the real distance 
between two points In the state Is 20 miles, how 
many Inches apart should thise two points be on the 
map? 



a. 1/2 Inch 

b. 3/4 Inch 



c. 1 1 nch 

d. 1 1/4 Inch 



2. ESTIMATE 
MEASUREMENTS 
IN WORD PROB 



(Given a map . . . ) 

Using Routes 21 and 222, what Is the approximate 
distance from Crest 1 Ine to Pleasantburg? 



a. 12 ml. 
* b. 20 ml. 



c. 30 ml. 

d. 35 ml. 



ERIC 



BE8T COPY AVAILABLE 



223 



RECALL / STATISTICS: 



NONE 



ROUTINE MAN IP. / STATISTICS: 



I. COMPUTE MEAN, 
MEDIAN, MODE 



From Monday tnrough Thursday, Roman's News Stand 
sold 17, 36, 41, and 30 copies of the Town News . 
What was the average number of papers sold per 
day? 



* a. 31 
b. 33 



c. 114 

d. 124 



2. COMPUTE 

PROBABILITIES 



The sectors of the spinners are colored red (R), 
olue (B), and green (G). What 1s the probability 
that the spinner will stop on the blue 1f you spin 
1t one time? 



a. 1/2 

b. 2/3 

c. 1/4 
* d. 1.3 



EXPLAIN / STATISTICS: 



NONE 



PROBLEM SOLVING / STATISTICS: 



(DRAW INFERENCES— NONE) 



ERIC 



STQI PROJECT 



Writing 

Definition and Identification of Skills 
CONTENT x SKILL HIERARCHY MATRIX 





RECALL 


ROUTINE MAN IP 


INFER.EVAL 


APPLLIC'N 


CONVENTIONS 


no Items 




no Items 


(see writing 
samples) 


GRAMMAR 






no Items 


ii 


WORD USAGE 


no Items 




no Items 


ii 


ORGANIZATION 


no Items 









SAMPLE ITEMS 



ROUTINE MAN IP./ CONVENTIONS: 



1. CAPITALIZATION 



2. PUNCTUATION 



3. ABBREVIATIONS 



4. SPELLING 



5. SUFFIXES, 
AFFIXES 



6. PLURALS 



Mark the answer that completes the sentence 
correctly. The longest river In the United States 
1s the . 

a. Mississippi river 

b. Mississippi river 
*c. Mississippi River 

d. Mississippi River 

Mark the answer that completes the sentence 

correctly. Our high school band Includes 

trumpets, and drums. 

a. clarinets 

b. clarinets; 
*c. clarinets, 

d. clarinets. 

The abbreviation for "street" Is: 
*a. st. 

b. st 

c. stt 

d. s. 

Choose the correct spelling of "9" 
a. n1n b. n1en *c. nine d. ne1n 

Choose the letter or letters needed to spell the 
word correctly. 

We will go swim every day. 

a. fng *b. nring c. e1ng d. in 

Choose the word which completes the sentence 
correctly. 

My two front are missing 

a. tooths *b. teeth c. teeths d. tooth 



9 

ERLC 



231 



7. CONTRACTIONS 



RECALL/ GRAMMAR : 

1. SENTENCE 
TYPES 



Choose the word which completes the sentence 

correctly. I seen her all day. 

a. hav'ent b! hav'nt *c haven't d. havent 



Choose the Interrogative sentence: 
*a. What should we do about 1t? 

b. Let's go to the store 1n an hour. 

c. What a sight that must have been! 

d. Marina checked out the book I wanted. 



ROUTINE MAN IP ./GRAMMAR: 

1. COMPLETE 
SENTENCES 



2. SUBJECT, 
PREDICATE 



3. COMPOUND OR 
COMPLEX 
SENTENCES 



4. MISPLACED 
MODIFERS 



Choose the one which will form one or more complete 
sentences. 

We go camping to get away from 



5. PARALLELISM 



a. crowds. To enjoy the peace and quiet. 

b. crowds, we enjoy the peace and quiet. 
*c. crowds. We enjoy the peace and quiet. 

d. crowds. Enjoying the peace and quiet. 

Choose the one which will form one or more complete 
sentences. 

The scnool carnival . 

a. next week c. lots of fun 

b. games and prizes *d. 1s coming 

Choose the one below which combines the numbered 
rsntences 1n the best way. 

1. Ladybugs are beatles 

2. Ladybugs are small 

3. They ; eed on Insects 

* a. Lidybugs are small beetles chat feed on 
nsects. 

b. Ladybugs are beetles, and they are small, and 
they feed on Insects. 

c. Ladybugs feed on Insects, and they are beetles, 
and they are small. 

Which of the following revU ons, 1f any, corrects 
the graaunar 1n this sentence; 
You can call your mother 1n London and tell her all 
about George's taking you out to dinner for just 
sixty cents. 

*a. Move N for just sixty cents" to the beginning. 

b. Change "George's" to "George" 

c. Change "can call" to "could call" 

d. Move "1n London" to the end. 

Mark the letter for the location of the error in 
this sentence: 



ERJC 



232 



Students 1n our French class like reading oetter 
a. b. c. 

than to work . 



ROUTINE MANIP./WORD USAGE: 



1. LANGUAGE CHOICES 
(specificity, 

senses, tone) 

2. SUBJECT-VERB 
AGREEMENT 



3. TRANSITION 
WORDS 



4. DOUBLE 
NEGATIONS 

5. PRONOUNS 



6. VERB FORMS 



Select the one which suggests an unfriendly attitude 
from Mr. Houser. 

Mr. Houser that we pay the bill. 

a. asked *b. demanded c. requested 

Mark the letter for the location of the error. 
Because Tyrone 1s really afraid of snakes, he don't 



a. 

want to go hiking with us. 

d 



Choose the word that best completes each sentence. 

You may use the same word more than once. 

To be a skillful debater, you must be able to argue 

both sides of an Issue. (1) study the side 

that you wll 1 defend. (2) test your position 

with arguments from the opposing side. (3) 
this may oecome a tedious task, it 1s usually the 
most prepared debater who wins, 
a. Then b. First c. Although d. Otherwise 
(2) (1) (3) 

Choose the one that completes the sentence 
correctly. He didn't buy popcorn. 



a. no 



any 



c. none 



Mark the letter for the location of the error. 
He spoke bluntly and angrily to we spectators, 
a d *c 

Choose the one that completes the sentence 
correctly. Every day I walk to work, but Bob 
a. run *b. runs c. runned d. ran 



ROUTINE MANIP. /ORGANIZATION: 



1. SENTENCE 
MANIPULATION 



2. SEQUENCE 
SENTENCES 



Mark the sentence below which expresses the thought 
most effe ~1vely and econmlcally. 

a. He spoke to me in a very warm manner when we met 
each other Tuesday. 

b. When we met Tuesday, I was spoken to in a very 
warm manner by him. 

c. His manner was very warm when meeting and 
speaking to me Tuesday. 

d. Tuesday he greeted me warmly. 

Choose the best order to arrange sentences Into a 
logical paragraph. 

1. At the first traffic light, you 1 11 see a red 
brick house on the corner. 



BEST COPY AVAILABLE 

233 



SELECT TOPIC 
SENTENCE 



4. SELECT 
IMPORTANT 
DETAIL 



2. To get to my house* turn right a f ter you leave 
the school and walk straight for three blocks. 

3. Walk down that street until you see a luuse with 

blue porch — that's my house. 

4. Turn left there. 

*a. 2-1-4-3 b. 2-3-1-4- c. 3-4-1-2 d. 1-2-3-4 

Choose the sentence which 1s the best topic sentence 
(main Idea) for the paragraph. 

. You should try to stAy away from 
trees and telephone wires. .. (paragraph continues) 
a. It 1s so much fun to make a Kite 
*b. When you're flying a Kite, there are several 
things you should keep 1n mind. 

c. It 1s so much fun to fly a kite. 

d. When you're buying a kite, you should remember to 
take enough money with you. 

Choose the best supporting detail for the main Idea 
expressed by the sentence: 

My youngest brother was frightened on his first day 
of school. 

a. My father was afraid of school when he wis 
younger. 

b. He already knew the alphabet. 

+c. He cried and clung to my father's hand, 
d. The teacher was friendly and encouraging. 



5. 



SELECT INFO 
TO INCLUDE 



The following outline was used 1n writing the 
paragraph below 1t. Choose the sentence needed to 
complete the paragraph according to the outline. 

I. Athletes don't get fat 

A. Example tennis players 

B. Other examples gymnasts and wrestlers 

C. Conclusion strict diets 



Most successful athletes don't allow themselves 

to become fat, because extra weight slows them down. 

. If th?y are 10 pounds overweight, they may be 

slowed down.. .{para, continues) 
a. There are many sports which I enjoy watching. 
*b. Tennis players, for example, have to move with lightning 
speed. 

c. You can play tennis at any age. 

d. Staying on a diet 1s difficult. 

6. LETTER FORMAT Mark the letter for the location of the error. 

(Given letter with underlined elements...) 
*a. (lack of complimentary close) 



ERLC 



234 



APPLICATION/ORGANIZATION: 



1. EDIT ORG'N 



KEY: 



2. JUDGE 

WRITING 



ATTITUDE: 



ERIC 



You are to make decisions about what should be 
revised to Improve the selection below. The 
underlined sentences are the ones about which there 
are questions. Use they KEY below to make judgments 
about each of the sentences. 



What 1s your best decision about the underlined, 
numbered sentences? 

A. KEEP. It 1s all r1$ht where 1t Is. 

B. TAKE OUT. It doesn T t fit anywhere. 

C. CHANGE. It 1s not clear at all and should be 
said 1n another way. 

D. MOVE. It should be at arother place. 

(Given paragraph with underlined sentences...) 

Read the student letter, and answer the question 
below. 
Dear Mr. Vega, 

I think the tidal pools would be a fun place 
to go for the fifth graders. It would be very 
Interesting and fun. Please consider this request 
careful ly. 

Yours truly, 
Pat Jones 

Suppose your friend just wrote this letter. What 
advice would help her make 1t more convincing to the 
prlnd pal? 

a. Indent "Dear Mr. Vega." 

b. Add Mr. Vega's address 1n the upper right-hand 
corner of the letter. 

c. Mention the dangers of going to the tidal pools. 
*d. Add examples of what could be learned by going. 

1. Good writing 1s Important to me because 1t helps 
me to get good grades. 

a. strongly agree 

b. agree 

c. neither agree nor disagree 

d. disagree 

e. strongly disagree 

2. Good writing will help me learn to express 
myself. 

a. very unlikely 

b. unlikely 

c. neither likely nor unlikely 

d. likely 

e. very likely 

BEST COPY AVAILABLE 

235 



WRITING SAMPLE 



This part of the writing test consists of one writing exercise 1n which you 
will be expected to show how well you can write. For the exercise, you 
will write an essay on the stated topic. 

You will have 30 minutes to complete your essay. You may wish to take the 
first few minutes to think about how vou will organize what you have to say 
before you begin to write. If you wish to make an outline or any notes, 
use the space for notes provided on the back of this sheet. This space 1s 
meant to help you plan your essay, but your notes will not be scored. All 
that will be scored 1s the essay you write on the 2 lined pages provided. 

Do your best to write a clear, well organized essay. You may not use a 
dictionary or any other reference materials during the test. If you finish 
your essay before time 1s called, read what you have written and make any 
changes that you feel will Improve your writing. 

TOPIC: Think of something Important that happened 1n your life. It may 
have been happy or sad, painful or enjoyable. Write an ess^y in which you 
tell what happened and why 1t was Important to you. 



ERIC 



236 



APPENDIX 14 



KEY TO SUMMARY SHEETS 
MATH, READING, AND WRITING 



ENTRIES IN TABLE 

In some cases, both the number of Items and the number of subskllls are 
known, 1n which case both appear 1n the table. 

Numbers on the left of the slash Indicate the number of Items on the 
test tnat fall 1n that row or column of the matrix* 

Numbers on the right side of the slash Indicate the number of 
different subskllls from that row or column of the Master Matrix that are 
tested* 

When the number of Items 1s unknown, only the number of subskllls (the 
number on the right of the slash) appears 1n the table. 

When neither the number of Items nor the number of subskllls Is known, a 

appears in the table 1f the state's materials mentioned that at least 
one subsklll 1n that row or column 1s on their test. 



MA T H HEADINGS 
Skill areas: 

N * Numbers & numeration (symbols, properties, computation, word 
problems) 

Y * Variables I relationships (algebra, trig, graphing) 
G * Geometry (terms, shapes, formulas, theorems, word problems) 
M ■ Measurement (metric i US Customary units: terms, coverslon, word 
problems) 

S ■ Statistics 4 Probability (computation, problems) 

Hierarchy level : 

1 ■ Recall (facts, definitions, symbols, concepts) 

2 * Routine Manipulation (basic computation, manipulation) 

3 ■ Explain, Translate, Judge (evaluate, attention to process) 

4 * Problems Solving (apply concepts, operations & facts, word 

problems) 



READI G HEADINGS 

The headings 1n Reading and Writing combine content and skill hierarchy 
since a number of the cells 1n the full matrix were not tested by any 
state, according to their materials. 

WA ■ Word Attack (First or Recall level; Includes phonetics, affixes, 

syllabication, etc.) 
VOC ■ Vocabulary (Spread across the Recall level second or literal 

comp level, and third or Infer level of the hierarchy. 

Preponderance of Items v*ere at 2 nd level.) 



23 7 



LC 11 Literal Comprehension (Second or Routine Manipulation/Literal 
level ) 

IC * Inferential, Evaluative Comprehension (Third level, except for a 
single subsklll Involving application of reading to new 
context. . .4th level ) 

SS 3 Study Skills (Primarily at the Second level, using information 
sources; one subsklll— judging which sources 1s appropriate— 1s 
at the Third level) 

AT - Attitude toward reading (no level specified) 



WRITING HEADINGS 

CO 3 Writing conventions (e.g. spelling, capitalization, punctuation, 
at the Second level ) 

GR 3 Grammar (sentence structure, parts of speech, etc., at the first 
and second levels) 

WU 3 Word Usage (choice of correct or most appropriate words, at tho 
second level ) 

OR > Organization (effective sentence manipulation, organization of 
words, sentences and paragraphs, all at the second level except 2 
subsklll s Involving editing or judging organization) 

AT 3 Attitude towards writing (no level specified) 

SM 3 Writing Sample (e.g. letter, theme, at the 4th or application 
level, cutting across the above content.) 



BEST COPY AVAILABLE 



238 



APPENDIX 15 



READING 



LIST OF STATES FOR STQI PROJECT 







STATE 


WA 


voc 


1." 3 


IC 


SS AT 


hn 




4-6 

i r 


TP 
it 




AT 


Sub 

SKI 


4 


26 


ALABAMA gj 


32/8 
36/5 


12/6 
15/1 


16/4 
6/2 


16/4 
14/4 


16/4 - 


CAT ^ 3 


20/5 
30/2 


16/4 
8/2 


12/3 
32/8 


12/3 
35/3 




18 
15 


1 




ALASKA 












12/4 


3/* 


7/6 


11/7 


12/3 


mm 


21 




12 


ARIZONA CAT 


36/5 


15/1 


6/2 


14/4 


— — 


CAT - 


30/2 


8/2 


32/8 


35/3 


— 


15 




LI 


ARKANSAS jJJ 


36/9 


— 


36/9 




36/9 — 


SRA 24/6 


— 

30/1 


36/9 
9/2 


36/9 
17/7 


56/14 
6/1 


— 


29 
11 


3 


14 


CALIFORNIA 
COLORADC 


60/3 


30/2 


73/3 


77/4 


30/2 — 


16/1 


54/2 


62/3 


78/16 


30/4 


: 


26 






CONNECTICUT °^ 












CAEP - 
4th — 
6th — 


11/3 
— 


12/4 
12/3 
/3 


34/11 
24/11 
/16 


30/5 
11/4 
/4 


13 
— 


23 
18 
23 




8 


DELAWARE CTBS/U 30/1 


25/3 


21/3 


4/1 




5/1 


40/2 


14/4 


29/8 


20/4 




19 


3 


8 


FLORIDA 


5/1 


15/1 


13/3 


10/2 


5/1 ~ 


9/1 


10/1 


24/4 


5/1 




— 


7 


5 


6 


GEORGIA 






72 


74 


— — 


— 


7 


72 


74 


— 


— 


6 






HAWAII 


72/2 


38/1 30/4 
~ no Info. ■— 


30/7 


10/2 — 


60/1 


36/1 


30/4 


30/7 


20/3 




16 






IDAHO 


























1 




ILLINOIS 














A/1 
0/ X 


0/9 


A/A 






7 


4 


14 


INDIANA 
IOWA 




15/3 


25/5 


30/6 


_ — 


— 


15/3 


25/5 


35/7 


— 


— 


15 


c 


1 1 


KANSAS 


18/2 


6/2 


9/3 


9/3 


3/1 — 

•Jl X 


6th 12/3 


9/2 


9/3 


12/4 

14/ H 

12/4 


lfl/fi 

xo/ u 

18/6 




4U 

18 




8 


KENTUCKY 


30/1 


25/3 


21/3 


4/1 




5/1 


40/2 


14/4 


29/8 


20/4 




19 


2 


10 
14 


2nd 8/1 
LOUISIANA 3rd 36/5 


16/2 
4/1 


20/5 
20/5 




12/2 - 
12/3 


20/4 


20/1 


4/1 


12/3 


16/4 




13 


2 




MAINE 
















15/3 


13/4 


12/4 




11 


4 


12 


MARYLAND CAT 
MASSACHUSETTS 


36/5 


15/1 


6/2 
Local 


14/4 




CAT - 


30/2 


8/2 
Local 


32/8 


35/3 




15 


1 




MICHIGAN 








r 




6/1 


15/3 


18/4 


27/8 


9/4 


15 


21 



KEY: 



WA - Word Attack 

VOC ■ Vocabulary 

LC ■ Literal Comprehension 

IC « Inferential Comprehension 

SS - Study Skills 

AT • Attitude 

# of items/ 1 of subskllls 

7 ■ urknown # of Items and subskllls 



BEST COPY AVAILABLE 



239 



READING 









1 ~ J 












4 - a 






Sufi- 




STATE 


WA 


VOC LC IC 


SS 


AT 


WA 


VOC 


LC 


IC 


SS 


AT 




6c 


MISSISSIPPI 




NEW 










NEW 










c 

3 


MT Kl MP VITA 








1 
1 




41/4 




*\ * } 






(7) 


2 


MISSOURI 














2/2 


9/3 


2/2 




7 


1 


MONTANA 
NEBRASKA 




No Program 






36/6 


IS/2 


8/4 


26/4 


33/4 


15 


21 


2 


16 NEVADA SAT 


72/2 


38/1 30/4 30/7 


10/2 




601 


36/1 


30/4 


30/7 


12/3 


— 


16 


6a 


NEW HAMPSHIRE 










7 


? 


? 


? 


73 




(?) 


2 


NEW JERSEY 




Local Choice 










Local 


Choice 










8 NEW MEXICO CTBS/U 


30/1 


25/3 21/3 4/1 






5/1 


40/2 


14/4 


29/8 


20/4 




19 






II 


DPP" (Infer missing word) 


















3a 


1 NEW YORK 




— - 56/1 












77/1 






11 




12 NORTH CAROLINA CAT 36/5 


15/1 6/2 14/4 




_ 




30/2 


8/2 


32/8 


35/3 




15 




NORTH DAKOTA 




No progran 






















OHIO 




No progran 






















OKLAHOMA 




No progran 




















3a 


OREGON 










19/5 


9/2 


16/4 


4/3 


12/5 




/19 


c 


12 PENNSYLVANIA 
RHODE ISLAND 




72 75 74 


71 






?2 


?2 

* low 


?6 


74 




/14 




CRT 


7 


IZ IZ /8 


IZ 




7 


7 


IZ 


/a 






/12 


6 


14 SOUTH CAROLINA 






















CTBS/U 








5/1 


40/2 


14/4 


29/8 


20/4 


— 


19 




SOUTH DAKOTA 




No progran 






















TLNNESSEE 
























36 


9 TEXAS 


IZ 


IZ /3 a 


11 






/l 


IZ 


/3 


IZ 




9 




UTAH CTBS/S 












40/1 


12/3 


2-"9 


20/3 




16 




VERMONT 




No progran 






















VIRGINS SRA 












30/1 


9/2 


17/7 


6/1 




11 




WASHINGTON CAT 












30/2 


8/2 


32/8 


35/3 




15 




WEST VIRGINIA 


30/1 


25/3 21/3 4/1 






5/1 


40/2 


14/4 


29/8 


20/4 




19 


6a 


WISCONSIN ggjjj 










5/1 


40/2 


17 
14/4 


/5 
29/8 


IZ 
20/4 




/14 
19 



WYOMING No progran 



BEST COPY AVAILABLE 



ERIC 



240 



READING 

LIST OF STATES FOR QUALITY INDICATORS PROJECT 

7 - 9 10 - 12 







STATE 


WA 


VOC 


LC 


IC 


SS 


AT 




UA 


VOC 


LC IC 


SS 


AT 


Sti>- 

skm 


4 


19 
17 


ALABAMA £jj 


8/2 


12/3 
30/2 


12/3 
8/2 


16/5 24/6 
32/11 15/2 






8/2 


13/3 


11/2 IB/4 30/6 
No Information 




17 


5 




ALASKA 
ARIZONA 


7 


7 


7 

CAT 


7 


7 










CAT 








3a 


23 


ARKANSAS 




16/4 


24/6 


16/4 


36/9 


















3a 


23 


CALIFORNIA 


15/1 


68/2 


48/3 235/15 36/2 








3/1 


47/4 50/5 


13/2 


_ 


12 






COLORADO 
































CONNECTICUT £JJJ ^ 
Master — 


26/2 
3/1 


27/5 285/16 54/9 
- 3/1 8/4 
/3 /16 /4 


21 




1/1 


26/2 


27/5 285/16 54/9 


21 


33 


3 


20 
7 


DELAWARE CTBS/U 
FLORIDAClf 


5/1 
15/1 


40/2 
10/1 


2/1 
18/3 


43/13 20/3 
9/2 14/7 




[13* 5/1 

[23*- 


5/1 


CTBT 

5/5 15/5 
20/4 15/3 


20/3 


— 


14 


5 


o 


GEORGIA 




7 


72 


76 












72 710 


73 




15 


2 


14 


HAWAII 
IDAHO 




SAT 4 DAT 
3/1 25/6 


14/4 


22/3 


— 


CRT 




STAS 


/l 11 


/6 




0 
7 


1 


11 


ILLINOIS 




10/2 


10/2 


10/7 










13/2 


6/2 7/3 






7 


4 


15 


INDIANA 
IOWA 




15/3 


25/5 


35/7 




















2 


15 


KANSAS 


3/1 


6/2 


18/4 


15/5 


15/3 








3/1 


30/2 21/7 


6/2 




12 




20 


KENTUCKY 


5/1 


40/2 


2/1 


43/13 20/3 










CTBS/U 








2 


16 


LOUISIANA 


?4/5 


8/1 


20/5 


15/3 


8/2 






8/2 


12/3 


20/5 24/6 


8/2 




18 


2 


11 


MAINE 






13/3 


15/4 


12/4 










10/3 16/3 


12/4 




10 


5 


7 


MARYLAND 
MASSACHUSETTS 




















CAT 








1 


20 


MICHIGAN 


3/1 


15/3 


16/5 


24/7 


9/3 


15 




3/1 


15/3 


21/5 26/8 


7/4 


15 


24 



*Flor1da [13 ■ State Student Assessment Test - Part I 
[23 - State Student Assessment Test - Part II 



BEST COPY AVAILABLE 

241 



READING 

LIST OF STATES FUR QUALITY INDICATORS PROJECT 

7-9 10-12 





























oUD" 






STATE 


WA 


VOC LC 


IC 


SS 


AT 


MA 


VOC 


IX 


IC 


SS AT 


skill 




7 


Cl] 


54/2 


74/4 40/2 


23/7 


20/7 


— — 














5 


15 


MT hi IIP PrtTI ^ P o T 

MINNt50TA*[2J 


30/3 


30/3 33/2 


15/3 


18/3 




54/2 


71/4 


48/2+ 


24/? 


26/7 - 
[It 


( ) 


6c 

WW 




MISSISSIPPI 
















NEW 








2 




Missomi 


















12/6 






1 




MONTANA 
NEBRASKA 




No Proaram 










25/2 


1/1 


20/6 


30/4 15 




2 


10 


NEVADA 




— 24/4 


12/2 


24/4 


— 


Same (9-12 High School Prof, exam) 


10 


6a 




NEW HAMPSHIRE 


?( ) 


? ? 


7 


? 




70 


7 


7 


7 


7 


( \ 

\ I 


2 


20 


NEW JERSEY 1n 




12/1 13/4 


43/11 22/4 


— 


Same 


(9-12 exam) 






20 




25 


out 


15/5 


20/1 21/4 


34/10 20/5 


















20 


NEW !€XIC0 CTBS/U 


5/1 


40/1 2/1 


43/13 20/3 








CTBS 






3a 


1 


NEW YORK 






77/1 












77/1 




/I 


5 


17 


NORTH CAROLINA CAT 




30/2 8/2 


32/11 15/2 


— 




11 


/3 


/4 


/3 - 


11 






NORTH DAKOTA 




No program 
























OHIO 




No program 
























OKLAHOMA 




No program 




















3a 


18 


OREGON 


6/2 


13/2 21/5 


6/5 


13/4 






15/2 


10/2 


15(7) 


20/4 


( ) 


5 


16 


PENNSYLVANIA 
RHODE ISLAND 




73 72 
ITBS 


77 


74 






3/1 


7/2 


34/7 




10** 


6 


12 
20 


SOUTH CAROLINA**^ ^ 


7 72 
40/2 2/1 


78 72 
43/13 20/3 




7 


7 


72 


79 


72 - 


/13 






SOUTH DAKOTA 




No program 




















5 


15 


TENNESSEE 


72 


72 75 


74 


72 




Same 


(9-12 exam) 






/15 


3b 


11 


TEXAS 

U'iAH CTBS/S 




71 72 


76 


72 






40/1 


3/1 


43/9 


20/4 — 


15 



*MN [1] « MSRI 
[2] « MSEA 



PA has voluntary test at grade 11 (EQA) but at other grades 
has voluntary and tmndatory. So coded imndatory Information 
at other grade levels. 

BEST COPY AVAILABLE 



242 



READING 



LIST OF STATES FOR QUALITY INDICATC .S PROJECT 

7-9 ' 10-12 



STATE 



WA VOC LC IC SS AT 



Sub- 



MA VOC IX IC SS AT skill 



6a 



VERMONT 

VIRGINIA 

WASHINGTON 



Nr program 



"Reacilng - M1n. Can- " - No other Info) 



6a 20 WEST VIRGINL, „/i 40/2 2/. 43/13 20/3 - 
CTB/t 

20 WISCONSIN^^ s/u 5/1 4Q/2 ■ 2Jl 4 3 /13 £ J /3 _ 

WYOMING No program 



IB 



BEST COPY AVAILABLE 



^43 

ERIC 



APPENDIX 16 



0 

FRIC 



244 



ERIC* 



KEY : H ■ #s * Nuneration 
V - Variables 
G - Geometry 
H - Measure 
S - Statistics 



Source 
Rating 

4 

1 



3 
3 



1 
4 



1 

5 



State 
ALABAMA 

ALASKA 
ARIZONA 
ARKANSAS 



CRT 
CAT 

CAT 
CRT 
SRA 



CALIFORNIA 
COLORADO 

CONNECTICUT CAEP 



KANSAS 

KENTUCKY CTBS/U 
LOUISIANA 



1 -Recall 

2 -Manlp. 

3* ■ Explain (higher 

4* " Prob. Slvg. order) 

Gr. 1 - 3 
V G M S 



MATH 



Sl««SRY 



KEY cont.; 



4-6 



# of 1 tans/I c subskl 1 Is 
7 - have but don't know 
I of (tens 



3* 4* 



35/6 4/1 4/1 16/3 — 12/3 30/5 7/2 /O 
49/10 3/2 1/1 13/6 — 2B/13 37/5 — 1/1 



49/10 3/2 1/1 13/6 
52/137 — 4/1 20/5 



— 23/13 37/5 — 1/1 
7 no Information — 



245/12 29/4 30/6 42.9 — 110/10 184/10 20/4 37/5 



4th 

6th 



DELAWARE CTBS/U 40/9 
FLORIDA 
GEORGIA 
HAWAII SAT 
CRT 

IDAHO 
ILLINOIS 
INDIANA 
IOWA 



— VI 3/2 -- 
49/5 19/3 - 

/a n n n -- 

85/11 6/1 5/4 9/3 ~ 
no info. - 



25/3 10/4 5/2 5/2 



30/6 ..'1 3/1 9/2 



40/9 



13/6 23/4 2/1 1/1 

20/4 48/4 

/8 /3 /l — 

29/9 58/5 5/2 13/3 



15/5 30/6 
18/5 27/5 



- 1/1 3/2 

2nd 52/8 4/1 

3rd* 76/8 4/1 4/1 16/3 



MAINE 

MARYLAND CAT 
MASSACHUSETTS 
MICHIGAN 

MINNESOTA J2^i 133 ^f 5 



49/10 3/2 1/1 13/6 



— 16/2 31/2 
/4 /l /10 



— 13/6 28/4 2/1 

— "36/7 16/1 4/1 

— 36/8 60/11 4/1 

— 28/13 37/5 — 



/O 



7 

n 



can't tell 
/16 /3 



1/1 



1/1 



7 

/4 



N 


V 


G 


M 


s 


1 


2 


3* 4* 


52/10 


3/1 


1 J /o 

14/2 


15/4 


4/1 


21/4 


41/9 


17/4 9/1 


68/15 


4/2 


5/3 


8/4 




17/9 


57/10 


2/2 9/3 


27/13 




4/2 


6/3 


— 


13/9 


18/6 


6/3 — 


68/15 


4/2 


5/3 


8/4 


— 


17/9 


57/10 


2/2 9/3 


52/13 




8/2 


12/3 




7 


no Info nut Ion — 


57/10 




4/3 


9/3 




13/9 


43/4 


5/1 4/2 


294/21 60/6 


87/5 


30/4 


23/2 


98/12 255/15 71/7 70/4 


39/6 


4/1 


3/1 


14/3 




21/5 


33/4 


6/2 — 


272/17 48/3 


18/1 


64/5 




144/9 152/10 88/6 16/1 


100/10 12/3 


8/2 


20/4 




50/8 


56/10 28/6 16/4 


63/14 


9/5 


5/3 


8/3 




10/5 


64/12 


3/3 8/5 


88/8 


5/1 




23/3 




29/4 


83/7 


4/1 — 


/ll 


/3 


/4 


/2 


11 


/5 


/9 


n — 


96/13 


6/1 


5/2 


10/1 


1/1 118/18 51/7 


9/3 21/3 


9/6 


1/1 


11/4 

11/4 


6/2 




12/6 


9/5 


— 6/2 


30/3 


10/4 


5/2 


5/2 


5/2 


24/8 


31/9 




36/7 


6/1 


3/1 


12/3 




18/5 


36/6 


3/i«i|y. 


42/5 


6/1 


6/1 


3/1 


3/1 


3/1 


18/7 


3/l6 th 9T. 


63/14 


9/5 


5/3 


8/3 




10/5 


<a/12 


3/3 8/5 


60/6 


8/2 


12/2 


8/2 




24/6 


56/5 


8/1 — 


68/15 


4/2 


5/3 


8/4 




17/9 


57/1'J 


2/2 9/3 


87/12 




9/1 


9/2 




48/8 


51/51 


9/2 — 



TJ 



111 

in 

5 
I 

i 

o 
u 

Ul 
0 



same (Minn. Basic Math) 



246 



24/ 



MATH 



Gr. 1 - 3 



4-6 



Source 
Rating 


State 


N 


V 


G 


M 


S 


1 2 3* 


4* 


N 


V 


G 


M 


S 


1 


2 


3* 


4* 


6 


MISSISSIPPI 


new - 


no Info. 










new - 


no Infc. 














2 


MISSOURI 
















15/5 


3/1 


6/2 


6/3 


6/3 


16/5 


15/6 


3/2 


2/2 


1 


MONTANA 
NEBRASKA 
















27/4 


9/1 


2/1 


1/1 






11/2 


— 


S/5 


6 


NEVADA SAT 
NEW HAMPSHIRE 
NEW JERSEY 


85/11 


6/1 


5/4 


9/3 




29/9 58/5 5/2 


13/3 


96/13 
/4 


6/1 

— — — 


5/2 

— - 


10/1 
/2 


1/1 118/18 51/7 
— 11 /3 


9/3 
/2 


21/3 
— 




NEW MEXICO CTBS/U 40/9 




1/1 


3/2 





13/6 W4 2/1 


1/1 


63/14 


9/5 


5/3 


8/3 




10/5 


64/12 


3/3 


8/5 


3 


NEW YORK 


48/3 




11/? 




6/1 


? can't tell 


7 


44/3 




13/7 


— 


9/7 


? 


can't tell 


? 




NORTH CAROLINA CAT 


49/10 


3/2 


1/1 


13/0 




2B/13 37/5 — 


1/1 


68/15 


4/2 


5/3 


8/4 




17/9 


57/10 


2/2 


9/3 




NORTH DAKOTA 




































OHIO 




































OKLAHOMA 


































3 


OREGON 
















67/11 






2/1 




2/1 


37/4 


13/4 


17/3 




PENNSYLVANIA 

TELLS 
















EGA-voluntary 














5 


/8 


11 


11 


/4 


/o 


/6 /7 /O 


11 


"ft 4 

TELLS 


HI 






1/1 


23/9 
/5 


"/i 1 


2/1 


3 fi 



n — /i 



RHODE ISLAND 

SOUTH CAROL INA CTBS/U . , 

have M-CRi at grades 1,2,3,6 

SOUTH DAKOTA 
TENNESSEE 
TEXAS 

UWH CTBS/S 
VERMONT 

VIRGINIA SRA 
WASHINGTON CAT 
WEST VIRGINIA CTBS/U40/9 
WISCONSIN CTBS/U 
WYOMING 



63/14 9/5 5/3 8/3 --- 10/5 64/12 3/3 8/5 
no other Information 



/4 /4 



1/1 3/2 — 13/6 2P/t 2/1 1/1 



/5 II II II 

63/14 9/5 5/3 8/3 

80/15 5/3 5/1 13/3 

57/10 — 4/3 9/3 

68/15 4/2 5/3 8/4 

63/14 9/5 5/3 8/3 

63/14 9/5 5/3 8/3 



12 It n --- 

10/5 64/1? 3/3 8/5 

14/7 68/10 6/2 15/3 

•3/9 43/4 5/1 4/2 

17/9 57/10 2/2 9/3 

10/5 64/12 3/3 8/5 

10/5 64/12 3/3 8/5 



248 



ERIC 



BEST COPY AVAILABLE 



MATH 



Gr. 7 - 9 



10 - 12 



9 

ERIC 



Source 

not tiny 

4 


State 


N V 


G 


M 


S 


1 


2 3* 


4* 


N 


V 


G 


M 


S 


1 


2 


3* 


4* 


ALABAMA j*[ 


61/6 8/2 
66/17 7/6 


12/3 

o/a 
»/* 


16/4 

7/1 


4/1 


8/2 
11/6 


41/8 8/1 
60/14 2/2 


44/5 
16/8 


44/7 


7/1 


14/2 


27/5 




6/2 


54/9 


6/2 


23/3 


5 


ALASKA 
ARIZONA CAT 


/10 11 
66/17 7/6 


/i 
9/4 


l\ 
1 0 

7/3 




.1/6 


/5 /3 
60/14 2/2 


16/8 






















ARKANSAS CRT 


/26? — 


4/1 

4/1 


8/2 


4/1 


7 


no information 7 




















3 


CALIFORNIA 

other: 


216/26 87/8 
15/1 


84/9 


30/4 


36/3 103/18 160/7 100/11 105/5 


126/9 


60/4 


24/3 


30/4 


14/4 


40/b 165/13 


— 


55/6 




COLORADO 




































CONNECTICUT CAEP 


48/8 4/1 


8/3 


10/5 


1/1 


17/5 


47/9 2/1 


5/3 


45/9 


4/1 


9/1 


10/5 


1/1 


11/5 


45/10 


3/2 


10/4 




Mastery 108/16 8/2 


12/3 


12/3 


4/1 


20/4 


72/12 36/6 


20/4 






















Prof. 


47/13 3/2 


4/3 


10/3 


1/1 


8/4 


43/12 6/3 


4/3 






















DELAWARE CTBS/U 


61/15 13/7 


4/3 


7/3 




10/6 


64/16 1/1 


10/5 




















2 


FLORIDA 


95/6 4/1 




15/2 






84/7 20/1 


10/1 


35/2 
80/7 


5/1 


5/1 


25/5 


5/1 


5/1 
20 


30/4 
60 


— 


40/2 


2 


GEORGIA 
HAWAII 

IDAHO 


/a /3 n 

SAT 4 DAT 
48/72 — 13/4 


/? 
18/4 


11 
3/1 


/« 
5/4 


/10 /5 
70/10 10/2 


H 

~i 


/10 
&TEC/3 


/3 




/3 


/2 
/4 
STAS 


11 




/4 


/10 

IS 


/5 




— 
12 


1 


ILLINOIS 


12/7 4/3 


26/12 


7/2 




6/4 


24/16 2/3 


17/3 


6/4 


7/2 


23/8 


5/2 




7/4 


15/5 


3/1 


16/5 


4 


INDIANA 
IOWA 


































2 


KANSAS 
KENTUCKY 


48/8 — 


3/1 


3/1 


3/1 


9/2 


39/7 — 


9/2 


42/1 


3/1 


6/2 


6/2 


3/1 




9/3 




51/3 


2 


LOUISIANA 


60/8 4/1 


4/1 


4/1 


4/1 


8/2 


60/9 — 


8/1 


56/6 


8/2 


8/1 


8/1 






72/9 




8/1 


2 


MAINE 



































MARYLAND 



CRT ????????? 

66/17 7/6 9/4 7/3 0 11/6 60/14 2/2 16/8 



CAT 

MASSACHUSETTS 
I MICHIGAN 

> MINNESOTA §EflM 108/fc 36/3, 

249 



81/11 3/1 12/3 12/2 — 39/8 60/7 9/2 — 

BEST COPY AVAILABLE 



72/9 3/1 15/3 12/8 6/2 12/3 78/11 12/3 6/2 
91/10 51/4 30/2 20/2 8/7 7 can't tell 7 29/3 



250 



MATH 



Gr. 7 - 9 



Source 
Rating 

6 
2 
1 



10 - 12 



2 
6 

2 

3 
5 



3 
5 



5 
3 



251 



State 

MISSISSIPPI 

MISSOURI 

MONTANA 

NEBRASKA 
NEVADA 

NEW HAMPSHIRE 

NEW JERSEY («*) 
(In) 

NEW MEXICO CTBS/U 
NEW YORK 

NORTH CAROLINA £J| 

NORTH DAKOTA 
OHIO 
OKLAHOMA 
OREGON 

EOA-vo 
PENNSYLVANIA I* 

TELLS 

RHODE ISLAND 

SOUTH CAROLINA 

CTBS/U 

SOUTH DAKOTA 

TENNESSEE 

TEXAS 

UTAH CTBS/S 
VERMONT 
VIRGINIA 
HASHING 
WEST VI 
HISCONS 
WYOMING 



N 


V 


G 


M 


S 


1 


2 


3* 


4* 


N 


V G 


M 


S 1 


2 


3* 


4* 


new - 


no Info. 














new ■ 


- no Info. 






























15/2 


3/1 6/4 


6/3 


6/3 12/5 


15/6 


1/1 


8/4 




















35/3 


12/1 11/1 


1/1 




12/1 






30/5 


12/1 


6/1 


46/5 




10/3 


74/6 


4/1 


6/2 




sane (9-12) 












/« 


/2 


/2 


/l 


/3 


/2 


/10 


/l 


/l 


/4 


/2 /3 


/l 


11 11 


/9 


to 


/o 


65/9 


5/1 


12/2 


10/6 




11/3 


57/8 


6/2 


19/5 
















57/10 


9/2 


l i/i 


15/2 


1/1 


5/2 


48/10 


9/2 


31/4 
















61/15 13/7 


4/3 


7/3 




10/6 


64/16 


1/1 


10/5 
















/6 


12 


/3 


— 


/l 


/2 


/7 


/I 


/2 


saw 


(9-12) 






























? 


? ? 


? 


? ? 


? 


7 


V 


66/17 


7/6 


9/4 


7/3 


— — _ 


ll/6 


60/14 


2/2 


16/8 
















59/12 













34/5 


10/4 


15/3 


58/9 




— 


— 2/1 


35/5 


1/1 


20/2 


1. 

38/17 


5/3 


9/5 


6/3 


1/1 


13/7 


31/15 


1/1 


I 

10/6 1 


35/13 


7/3 9/6 


6/4 


3/2 lb/7 


35/15 


1/1 


8/5 


/12 


/2 


/4 


/l 


/2 


/6 


/12 


/2 


/l 












nave M 


hCRT at 8th 


- no Info. 


10/6 








nave M-CRT at or. 


11 - 


no Info. 








61/15 13/7 


4/3 


7/3 




64/16 


1/1 


10/5 












/9 




/l 


/3 


/l 


/3 


/ll 


/l 




sane (9-12) 












/5 




/l 


/l 


/l 


/l 


/5 


/o 


/3 


































63/16 


18/6 5/4 


8/3 


— 11/6 


66/16 


7/4 


10/3 



WASHINGTON CAT 66/17 7/6 


9/4 


7/3 


— 11/6 


60/14 


2/2 


16/8 


WEST VIRGINIA CTBS/U61/15 13/7 


4/3 


7/3 - 


- 10/6 


64/16 


VI 


10/5 


WISCONSIN CTBS/'J 61/15 13/7 


4/3 


7/3 - 


- 10/6 


64/16 


1/1 


10/5 



10th gr. math CRT - no other Info, 



252 



BEST COPY AVAILABLE 



APPENDIX 17 



MUTING 



GRADES 1 - 3 G RADES 4-6 

Source 

Tggijjg State COGRWUOR AT SM COGRWUOR AT SM 



4 


CRT 

ALABAMA JJj 
ALASKA 


49/5 
■f LI 3 

40/3 


5/1 


1V3 4/1 
20/3 — 


— 


— 


45/3 


11/2 


20/4 
15/3 


lfi/3 
6/1 








ARIZONA CAT 


40/3 


5/1 


20/3 — 


— 


— 


45/3 


11/2 


15/3 


6/1 


— 


— 




ARKANSAS SRA 












54/3 


5/2 


25/5 


— 


— 


— 


3 


CALIFORNIA 


129/3 


60/1 


90/5 45/3 






110/6 


67/3 


113/8 


62/5 








COLORADO 
CONNECTICUT^ 












CAEP 5/3 
4th 21/4 
6th /3 


/l 


12/4 
15/3 
/l 


3/2 
/2 


19 
— 


8 

1(H,A) 
KM) 


3 


DELAWARE CTBS/U 

i^0Rmr------->. 


2/1 


8/2 
— 


10/2 — 
- 9/1 


— 
— 




70/5 
24/2 


14/3 
20/4 


17/4 
— 


10/2 
9/2 


— 
— 


— 
— 




GEORGIA 

COT 

HAWAII «f 


53/3 


"in 


no info. — 
12/4 — 






63/3 


15/1 


13/3 










IDAHO 
























1 


ILLINOIS 












20/4 


5/1 


17/2 




32 




4 


INDIANA (new) 
I (MA 
KANSAS 


? 


? 






1/1 


? 


? 








? 




KENTUCKY CTBS/U 


2/1 


8/2 


10/2 - 






70/5 


14/3 


17/4 


10/2 






2 


LOUISIANA 


16/3 




4/1 - 




3/1 P 














MAINE 






















X(H,P, 




MARYLAND CAT 


40/3 


5/1 


20/3 - 






45/3 


11/2 


15/3 


6/1 







MASSACHUSETTS 

MICHIGAN 

MINNESOTA 



KEY: 

'# of 1teos/# of subskllls 
CO - conventions (e.g., spell, caplt., punct.) 
GR • Grmnr (sentence structure) 
WU - Word Usage 
OR ■ Organization 
AT - Attitude 
SM - Writing Sample 
7 • Unknown # of Items and subskllls 



BEST COPY AVAILABLE 



o 253 
ERIC 



WRITING 



GRADES 1 - 3 

Source 

Rating State CO GR WU OR AT SM 

6 MISSISSIPPI NEW 
2 MISSOURI 
1 MONTANA 
NEBRASKA 

NEVADA SAT 50/3 10/1 12/4 — — — 

NEW HAMPSHIRE 
NEW JERSEY 

NEW MEXICO CTBS/U 2/1 8/2 10/2 — — — 
3 NEW YORK 

NORTH CAROLINA CAT 40/3 5/1 20/3 — — ~ 

NORTH DAKOTA 

OHIO 

OKLAHOMA 
3 OREGON 
5 PENNSYLVANIA 

RHODE ISLAND 

SOUTH CAROLINA 

SOUTH DAKOTA 
5 TENNESSEE 

3 TEXAS /4 II Jl - - 1/1 

UTAH CTBS/S 
VERMONT 
VIRGINIA SRA 
WASHINGTON CAT 

WEST VIRGINIA , 2/1 8/2 10/2 - — — 
CTBS/U 

WISCONSIN CTBS/U 

WYOMING 



GRADES 4 - 6 

CO GR WU OR AT SM 
NEW 

8/3 - - 6/2 - — 

11/2 - - — 15 ~ 

63/3 15/1 13/1 — 



70/5 14/3 17/4 10/2 - 
------ 2/(H) 

45/3 11/2 15/3 6/1 - 



13/5 6/2 5/2 4/2 - 1/(H) 
5/2 20/3 5/2 7/3 - — 

W TESTED AT GRADE 6 - NO INFO 
70/5 14/3 17/4 10/2 - 



/4 /l II ~ ~ 1/ 
70/3 - 24/5 11/2 - — 



54/3 


5/2 


25/5 




45/3 


11/2 


15/3 


6/1 - - 


70/5 


14/3 


17/4 


10/2 - 


70/5 


14/3 


17/4 


10/2 ~ — 



9 

ERJC 



BEST COPY AVAILABLE 

254 



MUTING 



Source 
Rating 



GRADES 7-9 



GRADES 10 - 12 



6 
2 
1 



State 


CO 


GR 


WU 


OR 


AT 


SM 


CO 


GR 


WU 


OR 


AT 


SM 


CRT 

ALABAMA ™ 


39/3 
45/3 


_ 

14/3 


15/3 
12/3 


27/4 
11/4 


— 




43/3 




24/3 


43/6 




« »■ 


ALASKA 


























ARIZONA CAT 


45/3 


14/3 


12/3 


11/4 


















ARKANSAS 


























CALIFORNIA 


123/3 


62/2 


82/4 


136/5 


m m 




124/3 


52/2 




38/3 






COLORADO 


























CONNETICUT £ 
CAE? 29/3 


11 
1/1 
6/3 


/l 
6/3 
40/6 


/2 
6/2 
17/7 


— 

41 


KH,A) 
IK?) 


29/3 


6/3 


40/6 


17/7 


41 


IK?) 


DELAWARE CTBS/U 


66/3 


8/2 


17/4 


20/5 


















FLORIDA 


28/3 


15/3 


5/2 


15/3 






35/3 


5/2 


5/2 


20/2 






GEORGIA 


























HAWAII 
IDAHO 


21/1 


SAT & OAT 






(H) 


CRT ~ 


— 


STAS 


11 


— 


3(?) 


ILLINOIS* 


2/1 


5/1 


13/1 


14/1 


32 




2/1 


5/1 


13/1 


14/1 


32/ 




INDIANA (new) 


? 


? 








? 














IOWA 


























KANSAS 


























KENTUCKY CTBS/U 


66/3 


8/2 


17/4 


20/5 


















LOUISIANA 


24/3 


8/1 


12/3 






2(P) 


40/3 


12/1 


4/1 


4/1 




2(P) 


MAINE 












?(H,P,A) 












?(H,P,A) 


MARYLAND 


(NOT MUCH INFO) 
45/3 14/3 12/3 


11/4 




2(?) 















MASSACHUSETTS 

MICHIGAN 

MINNESOTA 

MISSISSIPPI 

MISSOURI 

MONTANA 



(NEW) 



(NEW) 

4/2 1/1 1/1 9/5 - 
6/2 1/1 8/5 - 15/ 



plus 16 "Mixed" Items 



ERIC 



BEST COPY AVAILABLE 

255 



WRITING 



Source 
Rating 



GRADES 7 - '1 



State 



CO GR WU 
(MEW) 



OR AT 



12/3 18/3 1?.'4 
66/3 8/2 17/4 



24/4 
20/5 



MISSISSIPPI 
NEBRASKA 
NEVADA 

NEW HAMPSHIRE 
NEW JERSEY 
NEW MEXICO CTBS/U 
NEW YORK - — — — 

NORTH CAROLINA CAT 45/3 14/3 12/3 11/4 
NORTH DAKOTA 
OHIO 
OKLAHOMA 
OREGON 

PENNSYLVANIA* 
RHODE ISLAND 



26/4 4/1 5/2 
4/2 22/3 19/3 



3/2 
17/3 



SOUTH CAROLINA CRT 

CTBS/U 

SOUTH DAKOTA 

TENNESSEE 

TEXAS 

UTAH CTBS/S 

VERMONT 

VIRGINIA SRA 

WASHINGTON 

WEST VIRGINIA 
CTBS/U 

WISCONSIN CTBS/U 

WYOMING 



W TESTED AT 8 ■ 
66/3 8/2 17 '4 

/3 /3 /5 
/4 II II 



NO INFO 

66/3 8/2 17/4 
66/3 8/2 17/4 



20/5 ~ 
20/5 ~ 



SH 

2(H) 
3(h) 
3(H) 



- KH) 



NO, INFO 
20/5 - 



/l - - 



GRADES 10 - 12 
CC GR WU OR AT SM 
(NEW) 

SAME (9 - 12) 



SAME (9 - 12) 



3(H) 



5/1 16/2 24/3 22/3 — 
W TESTED AT Gr. 10 - NO INFO 

SAME (9 - 12) 

50/3 — 25/5 10/2 — 



2(H) 



Voluntary 



BEST COPY AVAILABLE 



ERIC 



256 



APPENDIX 18 



SUMMARY OF NUMBERS OF ITEMS AND SUBSKILLS IN EACH 
CELL OF MATH MATRIX FOR GRADES 4-6 AND 4-9 IN 
CALIFORNIA, ALABAMA, FLORIDA, LOUISIANA, PENNSYLVANIA 



CALIFORNIA GR 6 



KEY: Items/subskllls 



Numbers 
Variables 
Geometry 
Measurement 



RECALL 

(facts, 

terms, 

symbols) 

52/8 

3/1 

43/3 

0 



Statistics 0 
TOTAL 98/12 



ROUTINE 
MAN IP 
Compute 
simple word 
problems) 

175/7 

35/3 

12/1 

10/2 

23/2 
255/i5 



MATH 
EXPLAIN 



PROB SOLV 



(estimate, (h'ru word 
select algo, probs, apply 
translate) theorems) 



54/5 
7/1 
0 

10/1 
0 

71/7 



13/1 
15/1 
32/1 
10/1 

0 

70/ 4 



TOTAL 



294/21 
60/6 
87/5 
30/4 
23/2 

494/38 



CALIFORNIA GR 8 
Numbers 44/10 
Variables 10/1 
Ge^netry 45/6 
Measurement 4/1 
Statistics 0 
Other 



84/8 
30/4 
6/1 
4/1 
36/3 



64/7 
25/2 
7/1 
4/1 

0 



24/1 
22/1 
26/1 
18/1 



15/1 
(prob solv w/ 
maps, signs, ads, 
schedules, charts) 



216/26 
87/8 
84/9 
30/4 
36/3 
15/1 



ERIC 



TOTAL 103/18 160/17 100/11 105/5 

BE8T COPY AVAILABLE ^5 7 



468/51 



MATH 



. ALABAMA GR 6 

RECALL ROUTINE 
MAN IP 

Numbers 8/2 18/3 
Variables 0 3/1 
Geometry .1 5/1 

Measurement 4/1 11/3 
Statistics 0 4/1 

TOTAL 21/4 41/9 

ALABAMA GR 9 

Numbers 0 21/3 
Variables 0 4/1 
Geometry 4/1 4/1 
Measurement 4/1 8/2 
Statistics 0 4/1 

TOTAL 8/2 41/8 



JUOGE, PROB SOLV TOTAL 
TRANSLATE 

17/4 9/1 52/10 

0 0 3/1 

0 0 14/2 

0 0 3.5/4 

0 0 4/1 

17/4 9/1 88/18 

8/1 32/2 61/6 

0 4/1 8/2 

0 4/1 12/3 

0 4/1 16/4 

C 0 4/1 

8/1 44/5 101/16 



0 



258 



MATH 



FLORIDA GR 5 



RECALL ROUTINE JUDGE, PROB SOLV TOTAL 

MAN IP TRANSLATE 



Numbers 25/3 59/4 4/1 0 88/8 

Variables 0 5/10 0 5/1 

Geometry 0 0 0 0 0 

Measurement 4/1 19/2 0 0 23/3 

Statistics 0 0 0 0 0 

TOTAL 29/4 83/7 4/1 0 116/12 

F lorida gr 8 

Numbers 0 75/5 20/1 0 95/6 

Variables 0 4/10 0 4/1 

Geometry 0 0 0 0 0 

Measurement 0 5/1 0 10/1 15/2 

Statistics 0 0 0 0 0 

TOTAL 0 84/7 20/1 10/1 114/9 



FR?f 259 



MATH 



LOUISIANA GR 4 

RECALL ROUTINE 
MAN IP 

Numbers 12/3 40/2 

Variables 4/1 4/1 

Geometry 4/1 8/1 

Measurement 4/1 4/1 
Statistics 0 0 

TOTAL 24/6 56/5 

LOUISIANA GR 7 

Numbers 8/2 44/5 

Variables 0 4/1 

Geometry 0 4/1 

Measurement 0 4/1 

Statistics 0 4/1 

TOTAL 8/2 60/9 



JUDGE, PROB SOLV TOTAL 
TRANSLATE 

8/1 0 60/6 

0 0 8/2 

0 0 12/2 

0 0 8/2 

0 0 0 

8/1 0 88/12 

0 8/1 60/8 

0 0 4/1 

0 0 4/i 

0 0 4/1 

0 0 4/1 

0 8/1 76/12 



2G0 



MATH 



PENNSYLVANIA GR 5 


EQA (voluntary 1n 


84) 




RECALL 


ROLTINE 
MAN IP 


r \/|M i f ii 

EXPLAIN 


PROB 


Numbers 11/6 


22/6 


2/1 


1/1 


Variables 0 


2/1 


0 


0 


Geometry 4/2 


3/1 


0 


0 


Measurement 8/2 


2/2 


0 


2/1 


Statistics 0 


1/1 


0 


0 


TOTAL 23/10 


30/11 


2/1 


3/1 


PENNSYLVANIA GR 5 


"TELLS" (number of 1 


terns unspecified) 


Numbers /3 


/3 


0 


0 


Variables 0 


A 


C 


0 


Geometry /l 


n 


0 


0 


Measurement /l 


n 


0 


11 


Statistics 0 


0 


0 


0 


TOTAL /5 


/J 


0 


/l 


PENN GR 8 "EQA 11 (voluntary 1n '84) 






Numbers 11/4 


26/li 


1/1 


1/1 


Variables 0 


0 


0 


5/3 


Geometry 5/2 


3/2 


0 


1/1 


Measurement 2/1 


1/1 


0 


3/1 


Statistics 


1/1 


0 


0 


TOTAL 18/7 


31/5 


1/1 


10/6 



TOTAL 

36/14 
2/1 
7/3 

12/5 
1/1 

58/24 

/6 
/l 
/2 
/3 
0 

/12 

39/17 
5/3 
9/5 
6/3 
1/1 

60/29 



o 2FI 

ERIC 



MATH 



PENNSYLVANIA GR 8 "TELLS" (number of Items unspecified) 



RECALL ROUTINE 
MAN IP 

Numbers 12 17 

Variables II II 

Geometry 12 12 

Measurement II 0 

Statistics 0 12 

TOTAL 16 112 



EXPLAIN 

12 

0 

0 

0 

0 

12 



PROB SOLV 

II 

0 

0 

0 

0 

II 



TOTAL 

112 

12 

/4 

II 

12 

121 



ERIC 



282 



CTBS/U - Grade 6 

OE, KS, NM, 
SC, UT, WI 

RECALL 

Numbers 5/2 
Variables 1/1 
Geometry 4/2 
Measurement 0 
Statistics 0 
TOTAL 10/5 



KEY: Items/ subskl lis 



■4ATH 



ROUTINE EXPLAIN PROB SOLV TOTAL 

MAN IP 

53/8 2/2 3/2 63/14 

6/2 0 2/2 9/5 

1/1 0 0 5/3 

4/1 1/1 3/1 8/3 

0 0 0 0 

64/12 3/3 8/5 85/25 



ERIC 



263 



APPENDIX $9 

STQI Project Coding of Reporting Practices and 
Auxiliary Information 

State: 

Program: Minimum Competency Assessment (Testing) 

Title of Document(s): 
Description of Purpose of Report: 

- Audience: 

• Authoring Agency: 

• Authors: 

- Date of Report: 

• Stated Purpose or Objectives: 



• Type of Report: Results Technical 

I. General Description 

A) Type of Test: ommercial (i.e., Published Standardized Measure from Vendor) 

Private (i.e., Privately or Internally Developed) 

B) Name of Test: 

C) Version or Edition: 

D) Enter Dates of Testing (Month and Periodicity) and Nature of Testing (Census, Sample) 



Subject 
Area 


Grade Level 

K 1 2 3 4 5 G 7 8 9 10 11 12 


Reading 




Math 




Language 
Arts 




Writing 




Other: 













264 



II. Reported Results 



A) Metric: 

1. Indicate the type of scale(s) used to report results: 
Raw Score 

Scale Score (define: ) 

Percent Correct 

Grade Equivalent 

Percentile Rank 

NCE's 

Stanlne 

Other 



2. Which of the above are most frequently used? 



3. Enter Statistic (Measure of Central Tendency) used for Reporting 

Grade Level 



Metric 


K 1 2 3 4 5 6 7 8 9 10 11 12 


Raw Score 




Percent 
Correct 




Grade Equlv. 




Percentile 
Rank 




Scale Scores: 




Stanines 




NCE's 




Z-Scores 




T-Scores 




Other: 

















ERIC 



265 



B) Student Subgroup Definitions (Check all that apply) 

- Racial/Ethnic Groups (List groups identified in Report) 



r 



- Sex 

- Special Programs (List all progress identified in Report) 



Language Status (List all groups Identified in Report) 



- Other (List cnaracterisltlcs and groups identified in Repo. J 



C) School/District Groupings (Check all that apply and enter groups identified) 

School District 

Size 

Geographic Location 

Program Types 

Socio-Economic 

Other (specify) 



266 



D) Descriptive Statistics 

(Enter grade levels and subject areas at which the descriptive statistic cited is 
given for each grouping.) 

All Type 1 Type 2 Type 3 Type 4 

Students subgroups subgroups subgroups subgroups 

( . ) ( ) ( ) ( ) * 

I. Central 
Tendency 
Measures 

Mean 

Median 

Mode " 

Other(Name) ~ 



II. Variability/ 
Dispersion 
Measures 

Stand. Dev. 

Variance 

Range 

Other(Name) 



III. Distributional 
Info rmatlon 

Quartile 

Ouciles 

Qulntlles 

IV. Frequencies 
of students 
attaining 
each: 

Raw Score 
% Correct 
Other 



V. Percentages 
of students 
attaining 
each: 

Raw Score m 

X Correct 

Other ' 

Cut. Point: aTTove 

below 



9 

ERJC 



267 



E) Longitudinal Information: 

* Longitudinal Information Present Yes No 
0 Cohort Reported: 

- Same Students (Other ) tracked 

- Same Grades (Different Students) 

- Other: 

* Period for Reported Data: 

* Number of Time Points: 



Periodicity (How often conducted): 



Type of Statistic Reported at each Point: 
- Measure of Central Tendency: 



- Measure of Variability: 



- Other: 



• Test/Measure Stability: 

* Number of years with same measure: 
- Name of current measure: 



- Name of previous measure (If any): 

- Nature and reason for rhange: 



9 



F) Supplemental Analyses: 

• Psychometric Analyses: 1) Item Analyses: Difficulty 

Point ? 

Item Characteristic 
Information 

Other: 

2) Reliability: Internal Consistency 

Split-Half 
Test-Re test 



ERjC 268 



3) Validity: Concurrent 
Content 
Predictive 
Construct 

# Factor Analyses: (describe use) 



Currlcular Match: 


(describe) 






Test Bias Analyses 


: (describe) 






Teacher Analyses: 


(describe) 







G) Volume of Data: 

• Number of Students Tested: Total 

Per Grade Level 



H) Non-Test Information Collected 

1. Types of Additional Information Collected: 

- Att1tud1nal: 

- Demographic: 

- Other: 



2. Level (Respondent Level) of Information Collected: 

- Student 

- Schrrl 

- District 

- other 



ERLC 



263 



I) Other Reports Available from Progress: 



J) Other Comments: 



ERIC 270 



APPENDIX 20 



ERIC 



271 



ADDENDUM 



Linking State Educational Assessment Results: A Feasibility "Irial 

Prepared by R. Darrell Bock 
National Opinion Research Center, University of Chicago 

November, 1985 



Rec ant developments in the technology of educational 
measurement present opportunities for obtaining comparative 
information on educational progress in the states. This concepts 

• paper reviews some of these advances and outlines a proposed 
feasibility trial of one of them. 

1. Background 

Although the sample surveys conducted by the National 
Assessment of Educational Progress (NAEP) provide accurate 
measures of educational outcomes for the nation as a whole, the 
sampling rates are too low to enable reporting for geographical 
areas smaller than the four main regions — Northeast, Southeast, 
Midwest and West. As a result, no between-state comparisons of 
outcomes, or comparisons of state results with the national 
average, are possible within the present budgetary limitations of 
NAEP. Several strategies exi*>t, however, for obtaining such 
information. One that has already been proposed is for states to 
bear the cost of extending the NAEP sample to enough students 
fr^ cheir schools to insure a dependable state average. As a 

• very rough estimate, the marginal cost to each state for the 
additional sampling might be $150,000. 

States that already have system-wide attainment testing 
programs in operation could, however, obtain comparable or better 
information at less cost by making use of item-response theoretic 
(IRT) methods for linking of test scales (Lord, 1980). These 
methods would permit the states to express the scores of their 
present tests on a common scale, whxw v could be linked to the 
NAEP scale. The equating procedures require only that a small 
number of common, or "anchor" items, from each of the state tests 
be present in a specially prepared equating test that is 
administered to a broadly representative group of students at the 
relevant grade level. The scaling of items in this equating test 
can then be propagated back to the state test in order to define 
a scale with common origin and unit of measurement in all of the 

• test. If scaled NAEP items are also included, the common scale 
can be related to NAEP results. Apart from this one-time study 
establishing the equating links (which would need to be repeated 
when a state's test changed), the annual scoring oZ state 
results on the common scale would be a straightforward computer 
operation c *.ng perhaps $100 per 10,000 students*. 



9 

ERJC 



l 

272 



There are two possible approaches to creating the special 
equating tests: 

1.1 For those states that are a. ready testing closely 
similar subject matter, the simplest aoproach is for them to 
contribute to the equating test three to six of their items in 
each skill area for which scales are to be constructed. These 
items, plus some scaled NAEP items in the same areas, would then 
be administered under uniform conditions in a few selected 
schools in the participating states. Since the results would be 
used only in test linking and not for describing attainment, the 
sampled schools would not need to be representative <~f the state. 
It is only necessary that the full range of student attainment is 
covered, the data obtained in this way in the participating « 
states would then be collated for IRT scaling. Similar scaling 
of the state test from which these items arose would also be 
carried out separately on operational data supplied by each 
state. The item scale parameters of the anchor items would then 
be used to adjust all of the state results to the same origin and 
unit of measurement. Using these results, each state could 
express the attainm of pupils or schools in therms of this 
common scale. All participating states' results would then be 
comparable and could be related to the corresponding NAEP scale 
if NAEP items were included. Even commercial tests could be 
included in the linking, provided the publishers would agree to 
this use of some of their items. 

1.2. If the states are not already testing in comparable 
subject-matter skill areas, a more extensive initial e c ort would 
be required. Curriculum experts from each of the participating 
states would have to meet and agree on the content of the areas , 
to be tested. They would then have to assemble and select items 
representing this content. Some new items might have to be 
written, but for the most part existing items from state testing 
programs and from NAEP could be used. This newly constructed 
equating test would then be administered to a broad sample of 
students at the relevant grade level and the rest Its subjected to 
IRT analysis as above. Each state could then insert some of the 
scaled items from the equating test into new tests devised for 
its own program, by scoring the new tests by IRT methods anchored 
on these scaled items, each state could then express its outcome 
measures on the same scale for purposes of comparison with other 
states or with national results. 

In addition to the economy of these linking strategies for 
comparing educational outcomes in the states, they have several 
advantages offer the alternative of extending the NAEP sample: • 
1) no dditional operational testing beyond that of the existing 
stc.-e program would be required, 2) the state would have results 
for all students included in the existing state program, not just • 
those in the probability sample collected by NAEP, 3) the 
objectives and content of the state testing would not be 
determined or limited by NAEP policy and practices In assessment, 
4) commercial as well as state testing organizations could 
participate, 5) new avenues for communication between the state 
testing programs would be opened, and the capabilities of the 



2 

ERIC * 73 



programs would be strengthened, and 6) in the course of shoosing 
content and skills to be included in the equating, greater 
consensus between the states on curriculum problems would be 
fostered. 

2. Proposal for an Initial Feasibility Trial 

Results of recent study by Burstein, et al, (1985) reveal 
sufficient communality of test content at the eighth grade level 
to support a trial of the first C these two linking methods in a 
number of states. It is proposed that five of these states join 
in a pilot study to evaluate procedures for this purpose and to 
develop prototypes of documents for reporting and comparing state 
educational outcomes. The study would be limited to measures of 
1) reading proficiency and 2) basic mathematical skills, assessed 
in throe schools in each of these states during the spring term. 
A high, middle and low SES school should be enlisted for this 
purpose by each of the respective state education o. iices. Each 
school would be requested to make one fifty minute class period 
available for administration of the equating test to all or mout 
of their eighth grade students. 

The states should be selected to include at least one that 
employs traditional individual student achievement testing and 
one that employs matrix sampled assessment. In addition at least 
one of the states should routinely test in the autumn in grade 
eight and one in the spring of grade seven. States on both 
plans present a special problem in equating because the scores 
from the earlier testing or different grade level must be 
adjusted their ' pr idicted values for the standard testing time 
and grade level tested. So that corrections of scores to^ 
nontypical testing time can be estimated, those states not 
testing in the spring of grade eight should then test all 
students in the pilot schools ia both grade eight and grade 
seven. 

Each state would contribute four items each from its current 
reading and mathematics tests for grade eight. NAEP would be 
requested to provide an additional four scaled items in each of 
these subject-matter areas. These items would be assembled into 
a 48 -item expendable-form test intended for non-speeded 
adiminir trat!„n . 

Coordination of the testing and monitoring of test 
administration in each school would be handled by field staff of 
a national survey organization. 

Scoring and IRT analysis of the resulting data would be 
contracted to an organization with capabilities in this area. 
Each state would also supply this organization with a computer 
tape containing the response of students to items of its reading 
and mathematics tests administered in current operational 
testing. The latter data would be IRT scored on the common scala 
for purposes of the prototype demonstration of between-state 
comparisons and relating to the NAEP nstional results. The 



3 



9 

ERIC 



274 



organisation or organizations responsible for field testing and 
analysis would produce the prototype report and also submit a 
technical report documenting procedures and discussing amy 
significant problems encountered during their work. 

Because of its experimental nature, this proposed trial has 
been held to modest proportions to keep costs low. It is 
estimated that, once the states agree to cooperate and the itens 
for the equating test have been assembled, the field work and 
analysis could be carried out by an organization already equipped 
for these activities for about $80,000 of direct costs. 

3. Further Steps 

Procedures for the proposed initial trial are sufficiently 
straightforward that a three month lead time should be enough to 
prepare the cest and make arrangements for field testing. 
Another three months should be enough for analysis and 
preparation of the prototype report. If the feasibility trial is 
judged successful, work could begin on an operational system 
involving more subject-matter areas. At that point, it is likely 
that the participating states will wish to move to the second 
strategy for linking based on the development of a common 
equating test. Some of the states might then choose to alter 
their testing programs to conform more closely to the content of 
that test. Such changes, supported by the scale linking through 
the equating test, would further facilitate the comparison of 
educational outcomes among tue states and with the nation as a 
whole. 



REFERENCES 

Burstein, L., Baker, E.L. , Aschbacher, P. & Keesling, W. (1985). 
Using Test Data for National Indicators of Educational 
Quality: A Feasibility Study. Los Angeles: Center for the 
Study of Evaluation, UCLA Graduate School of Education. 

Lord, F.M. (1980). Applicatons of Item Response Theory to 
Practical Testing Problems. Hilsdale, NJ: Earlbaum. 



ERIC 27; 



